This question already has answers here:
Ordered list with O(1) random access and removal
(2 answers)
Data structure that allows accessing elements by index and delete them in O(1)
(4 answers)
Closed 4 years ago.
I'm looking for a data structure in Java, that has the following properties:
Deletion in O(1) time using the index inside the structure, while maintaining the relative order of the elements (sorted initially).
Addition, only at the end of the structure.
No updation is required.
Single traversal after all deletions.
Options that I've tried:
Array: Can not delete in O(1) time, as shifting is required. Plus, if I use a HashSet of deleted (or not deleted) elements, then too I would have to go through the deleted elements once, while travelling through the array.
Linked List: Deletion is O(1) (if you have a reference to the Node to be deleted, and it's a doubly linked list, preferably), and shifting is not necessary. But there is no indexing, so I have to traverse from the start, to determine the Node that has to be deleted.
TreeSet: A treeset can maintain the order, and deletion is O(1) time,but via the element, but not the index inside the structure.
I'm looking for a data structure that can help me in the tasks mentioned above, if possible, in Java. If it is not built-in, then I would like to know the implementation of the said data structure.
The need:
I was trying to solve this question. A string of English characters is given initially, then a number of operations are to be performed on the string. Each operation has a character c and a number an alongside, which means that we have to delete nth occurrence of the character c.
My solution:
I would create an array of type X (the data structure I am looking for), of length 26 (for each character). I would add each occurrence of a character, say d, in the 3rd slot (starting from 0), in objects that contain the index in the String itself. I would do this for all the characters of the String. This would take a total time of O(n), if the length of the string is n.
Once this is done, I would start processing the queries. Each query requires us to delete the nth occurrence of the character c (variable, not the actual English character c), which we can do in O(1) time (as required). So, each such deletion would take O(q) time, where q is the number of queries.
Then we can make a charArray that has the length of that of the original string. Then traverse through all the elements remaining in each slot of the array of objects of type X, and put them in their respective places. Once this is done, we can traverse this charArray again and ignore all the empty places, and construct a string from the elements left.
Related
Consider an Arraylist. Internally it is not full, and the number of elements inserted so far is known. The elements are not sorted.
Choose the operations listed below that are fast regardless of the number of elements contained in the ArrayList. (In other words, takes only several instructions to implement).
Insertion
Insertion at a given index
Getting the data from a specified index
Finding the maximum value in an array of integers (not necessarily sorted)
Deletion at the given index
Replacing an element at a specified index
Searching for a specific element
I chose Insertion at specified index, Getting the data from a specified index, and replacing an element but answer key says Insertion. As I usually understand it, in an ArrayList, the insert operation requires all of the elements to shift left. If we did this at the beginning of the list, we would have $O(n)$ time complexity. However, if we did it at the end, it would be $O(1)$.
My question comes down to: (1) what, if any, difference is there between insertion and insertion at specified index and (2) given this particular time complexity for insertion why is it considered "fast"
First take a look at these two methods defined in java.util.ArrayList
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
Now if you see the first method (just adding element), it just ensures whether there's sufficient capacity and appends element to the last of the list.
So if there's sufficient capacity, then this operation would require O(1) time complexity, otherwise it would require shifting of all elements and ultimately time complexity increases to O(n).
In the second method, when you specify index, all the elements after that index would be shifted and this would definitely take more time then former.
For the first question the answer is this:
Insertion at a specified index i takes O(n), since all the elements following i will have to be shifted to right with one position.
On the other hand, simple insertion as implemented in Java (ArrayList.add()) will only take O(1) because the element is appended to the end of the array, so no shift is required.
For the second question, it is obvious why simple insertion is fast: no extra operation is needed, so time is constant.
ArrayList internally is nothing but an Array itself which uses Array.copyOf to create a new Array with increased size,upon add,but with original content intact.
So about insertion, whether you do a simple add (which will add the data at the end of the array) or on ,say, first(0th) index , it will still be faster then most data structures , keeping in mind the simplicity of the Data Structures.
The only difference is that simple add require no traversal but adding at index require shifting of elements to the left, similarly for delete. That uses System.arrayCopy to copy one array to another with alteration in index and the data.
So ,yeah simple insertion is faster then indexed insertion.
(1) what, if any, difference is there between insertion and insertion at specified index and
An ArrayList stores it's elements consecutively. Adding to the end of the ArrayList does not require the ArrayList to be altered in any way except for adding the new element to the end of itself. Thus, this operation is O(1), taking constant time which is favorable when wanting to perform an action repetitively in a data structure.
Adding an element to an index, however, requires the ArrayList to make room for the element in some way. How is that done? Every element following the inserted element will have to be moved one step to make room for the new insertion. Your index is anything in between the first element and and the nth element (inclusively). This operation thus is O(1) at best and O(n) at worst where n is the size of the array. For large lists, O(n) takes significantly longer time than O(1).
(2) given this particular time complexity for insertion why is it considered "fast"
It is considered fast because it is O(1), or constant time. If the time complexity is truly only one operation, it is as fast as it can possibly be, other small constants are also regarded fast and are often equally notated by O(1), where the "1" does not mean one single operation strictly, but that the amount of operations does not depend on the size of something else, in your example it would be the size of the ArrayList. However, constant time complexity can involve large constants as well, but in general is regarded as the fastest as possible time complexity. To put this into context, an O(1) operations takes roughly 1 * k operations in an ArrayList with 1000 elements, while a O(n) operation takes roughly 1000 * k operations, where k is some constant.
Big-O notation is used as a metric to measure how many operations an action or a whole programs will execute when they are run.
For more information about big O-notation:
What is a plain English explanation of "Big O" notation?
Given a string, seperted by a single space, I need to transfer each word in the String to a Node in a linked list, and keep the list sorted lexically (like in a dictionary).
The first step I did is to move through the String, and put every word in a seperate Node. Now, I'm having a hard time sorting the list - it has to be done in the most efficient way.
Merge-sort is nlogn. Merge-sort would be the best choice here?
Generally if you had a list and wanted to sort it merge sort is a good solution. But in your case you can make it better.
You have a string separated by spaces and you break it and put it in list's nodes. Then you want to sort the list.
You can do better by combining both steps.
1) Have a linked list with head and tail and pointers to previous node.
2) As you extract a word from the sentence store the word in the list in inserted order. I mean you start from the tail or head of the list depending on if it is larger or smaller than these elements and go forward until you reach an element larger/smaller than the current one. Insert it at that location. You just update the pointers.
Just use the built-in Collections.sort, which is a mergesort implementation. More specifically:
This implementation is a stable, adaptive, iterative mergesort that requires far fewer than n lg(n) comparisons when the input array is partially sorted, while offering the performance of a traditional mergesort when the input array is randomly ordered. If the input array is nearly sorted, the implementation requires approximately n comparisons. Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays.
I have an ArrayList which I fill with objects of type Integer in a serial fashion (i.e. one-by-one) from the end of the ArrayList (i.e. using the method add(object)). Every time I do this, the other objects in the ArrayList are of course left-shifted by one index.
In my code I want to find the index of a random object in the ArrayList. I want to avoid using the indexOf method because I have a very big ArrayList and the looping will take an enormous amount of time. Are there any workarounds? Some idea how to keep in some data structure maybe the indexes of the objects that are in the ArrayList?
EDIT: Apparently my question was not clear or I had a missunderstanding of the arraylist.add(object) method (which is also very possible!). What I want to do is to have something like a sliding-window with objects being inserted at one end of the arraylist and dropped from the other, and as an object is inserted to one end the others are shifted by one index. I could use arraylist.add(0, object) for inserting the objects from the left of the arraylist and right-shifting each time the previous objects by one index, but making a google search I found that this is very processing-intensive operation - O(N) if I remember right. Thus, I thought "ok, let's insert the objects from the right-end of the arraylist, no problem!", assuming that still each insertion will move the previous objects by one index (to the left this time).
Also when I use the term "index" I simply mean the position of the object in the ArrayList - maybe there is some more formall term "index" which means something different.
You have a couple of options. Here are the two basic options:
You can maintain a Map<Object,Integer> that holds indexes, in parallel to the array. When you append an element to the array you can just add it to the map. When you remove an element from the beginning you will have to iterate through the entire map and subtract one from every index.
If it's appropriate for your situation and the Map does not meet your performance requirements, you could add an index field to your objects and store the index directly when you add it to the array. When you remove an element from the beginning you will have to iterate through all objects in the list and subtract one from their index. Then you can obtain the index in constant time given an object.
These still have the performance hit of updating the indexes after a remove. Now, after you choose one of these options, you can avoid having to iterate through the map / list to update after removal if you make a simple improvement:
Instead of storing the index of each object, store a count of the total number of objects added so far. Then to get the actual index, simply subtract the count value of the first object from the value of the one you are looking for. E.g. when you add:
add a to end;
a.counter = counter++;
remove first object;
(The initial value of counter when starting the program doesn't really matter.) Then to find an object "x":
index = x.counter - first object.counter;
Whether you store counter as a new field or in a map is up to you. Hope that helps.
By the way; a linked list will have better performance when removing object from the front of the list, but worse when accessing an object by index. It may be more appropriate depending on your balance of add/remove vs. random access (if you only care about the index but never actually need to retrieve an object by index, random access performance doesn't matter). If you really need to optimize further you could consider using a fixed-capacity ring buffer instead (back inserts, front removes, and random access will all be O(1)).
Of course, option 3 is to reconsider your algorithm at a higher level; perhaps there is a way to accomplish the behavior you are seeking that does not require finding the objects in the list.
This question already has answers here:
When to use LinkedList over ArrayList in Java?
(33 answers)
Closed 9 years ago.
Talking in Java's context. If I want to insert in the middle of either an ArrayList or a linkedList, I've been told that Arraylist will perform terribly.
I understand that it is because, we need to shift all the elements and then do the insertion. This should be of the order n/2 i.e. O(n).
But is not it the same for linkedList. For linked List, we need to traverse till the time we find the middle, and then do the pointer manipulation. In this case too, it will take O(n) time. Would not it?
Thanks
The reason here is that there's no actual shifting of elements in the linked list. A linked list is built up from nodes, each of which holds an element and a pointer to the next node. To insert an element into a list requires only a few things:
create a new node to hold the element;
set the next pointer of the previous node to the new node;
set the next pointer of the new node to the next element in the list.
If you've ever made a chain of paper clips, you can think of each paper clip as being the beginning of the chain of it and all the paper clips that come after it. To stick a new paper clip into the chain, you only need to disconnect the paper clips at the spot where the new one will go, and insert the new one. A LinkedList is like a paper clip chain.
An ArrayList is kind of like a pillbox or a mancala board where each compartment can hold only a single item. If you want to insert a new one in the middle (and keep all the elements in the same order), you're going to have to shift everything after that spot.
The insertion after a given node in a linked list is constant time, as long as you already have a reference to that node (with a ListIterator in Java), and getting to that position will typically require time linear in the position of the node. That is, to get to the _n_th node takes n steps. In an array list (or array, or any structure that's based on contiguous memory, really) the address of the _n_th element in the list is just (address of 1st element)+n×(size of element), a trivial bit of arithmetic, and our computing devices support quick access to arbitrary memory addresses.
I think, when analysing the complexity, you need to take into account the metric you are using. In the ArrayList, your metric is shuffling, which is just assignment. But this is quite a complex operation.
On the other hand, you're using a LinkedList, and you're simply looking going to the reference. In fact, you only perform 1 insertion. So while the algorithmic complexity will wind up similar, the actual processes that are being executed at O(n) time are different. In the case of an ArrayList, it is performing a lot of memory manipulation. In the case of a LinkedList, it's only reading.
For those saying he doesn't understand LinkedLists
A LinkedList only has a pointed at the start, and a pointer at the end. It does not automatically know the Node behind the node you want to delete (unless it's a doubly linked list) so you need to traverse through the list, from the start by creating a temp pointer, until you come to the node before the one you want to delete, and I believe it's this that OP is discussing.
I often* find myself in need of a data structure which has the following properties:
can be initialized with an array of n objects in O(n).
one can obtain a random element in O(1), after this operation the picked
element is removed from the structure.
(without replacement)
one can undo p 'picking without replacement' operations in O(p)
one can remove a specific object (eg by id) from the structure in O(log(n))
one can obtain an array of the objects currently in the structure in
O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW:
note that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).
HashMap has complexity O(1) both for insertion and removal.
You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in
O(1), after this operation the picked
element is removed from the structure.
(without replacement)
This is the only op that require O(n).
one can undo p 'picking without
replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg
by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects
currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT:
example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];
Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in map. Get its array index. Lets call this index i (map.get(i)) - O(1)
Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position i of the array in map (map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps
Here's my analysis of using Collections.shuffle() on an ArrayList:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a subList() of the remaining elements.
✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via add().
❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using toArray() looks reasonable.
How about an array (or ArrayList) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap of IDs to indices.
If you poke around on line you'll find various IndexedHashSet implementations that all work on more or less this principle -- an array or ArrayList plus a HashMap.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?