Handling duplicates when shuffling an array - java

I wish to shuffle an array with duplicate elements. Using the shuffle method from Collections, how does it handle duplicates? I don't want two duplicates swapping each other. Thanks

All that is guaranteed about the behavior of the method is in the Javadoc.
The current implementation chooses swaps randomly, without regard to the content of the list. The general contract states that the ideal is for all permutations to be equally likely, so I would not anticipate it ever going to an implementation that requires an element to move, much less requires that it be swapped with an element of differing value. In general that's what shuffling is - random order, which can just as well (some small percentage of the time) mean "the same order that came in (or an equivalent order)". And the shuffle() method addresses the general case.
If you need every element to be swapped with an element of differing value, you can of course write your own method to do that. Beware that a naive implementation could fall into an infinite loop if there are too many duplicates relative to the size of the collection.

This is the method for Collections#shuffle(List, Random) (A pastebin including the documentation can be found here:
#SuppressWarnings({"rawtypes", "unchecked"})
public static void shuffle(List<?> list, Random rnd) {
int size = list.size();
if (size < SHUFFLE_THRESHOLD || list instanceof RandomAccess) {
for (int i=size; i>1; i--)
swap(list, i-1, rnd.nextInt(i));
} else {
Object arr[] = list.toArray();
// Shuffle array
for (int i=size; i>1; i--)
swap(arr, i-1, rnd.nextInt(i));
// Dump array back into list
// instead of using a raw type here, it's possible to capture
// the wildcard but it will require a call to a supplementary
// private method
ListIterator it = list.listIterator();
for (int i=0; i<arr.length; i++) {
it.next();
it.set(arr[i]);
}
}
}
This is the overloaded variant of Collections#shuffle(List).
The difference is that you can pass your own Random object if you want to seed it yourself.
As you can see it does not look at the values in each array slot. You could try to override this method and include a check for duplicates.
On a side note: Try checking the JavaDocs for these kind of questions. If you are unsure how a method works, just google the class + method name, or use the local Java source code on ur computer.

Related

Returning Duplicates Arraylist

Below is a simple for loop I am using to try and go through and find the repeated ID's in a array list. The problem is that it only checks one index to the right so quite clearly if there is the same ID two, three or even four indexes across it will miss it and not report it as a repeated ID.
Obviously the goal of this code is to move through each index of the array list, get the ID and check if there are any other identical ID's.
Note for the below arraylist is...arraylist, the getId method simply returns the user ID for that array object.
for (int i=0; i<arraylist.size()-1; i++) {
if (arraylist.get(i).getId() == arraylist.get(i+1).getId()) {
System.out.println(arraylist.get(i).getId());
}
}
What I've tried and keep coming back to is to use two embedded for loops, one for iterating through the array list and one for iterating through an array with userIDs. What I planned on doing is checking if the current arraylist ID was the same as the array with 'pure' IDs and if it wasn't I would add it to the array of 'pure IDs. It would look something like this in psudocode.
for i<-0 i<arraylist size-1 i++
for j<-0 j<pureArray size j++
if arraylist.getId(i) != pureArray[j] then
increment pureArray size by one
add arraylist.getId(i) to pureArray
In practice perhaps due to my poor coding, this did not work.
So any opinions on how I can iterate completely through my arraylist then check and return if any the gotten IDs have multiple entries.
Thank you.
Looking at leifg's answer on this similar question, you can use two sets, one for duplicates and one for everything else, and you can Set#add(E), which "returns true if this set did not already contain the specified element," to determine whether or not the element is a duplicate. All you have to do is change the sets generics and what you are adding to them:
public Set<Integer> findDuplicates(List<MyObject> listContainingDuplicates)
{
// Assuming your ID is of type int
final Set<Integer> setToReturn = new HashSet();
final Set<Integer> set1 = new HashSet();
for (MyObject object : listContainingDuplicates)
{
if (!set1.add(object.getID()))
{
setToReturn.add(object.getID());
}
}
return setToReturn;
}
For the purpose of getting duplicates, nested for loop should do the job, see the code below. One more thing is what would you expect this nested for loop to do.
Regarding your pseudocode:
for i<-0 i<arraylist size i++
for j<-i+1 j<arraylist size j++
if arraylist.getId(i) != arraylist.getId(j) then
add arraylist.getId(i) to pureArray
1) Regarding j<- i+1, with every iteration you do not want to compare the same thing many times. With this set up you can make sure you compare first with others, then move to second and compare it to the rest (not including first because you already did this comparison) etc.
2) Incrementing your array every single iteration is highly impractical as you will need to remap and create a new array every single iteration. I would rather make sure array is big enough initially or use other data structure like another ArrayList or just string.
Here is a small demo of what I did, just a quick test, far no perfect.
import java.util.ArrayList;
public class Main {
public static void main(String[] args) {
// create a test array with ID strings
ArrayList test = new ArrayList<>();
test.add("123");
test.add("234");
test.add("123");
test.add("123");
String duplicates = "";
for(int i = 0; i < test.size(); i++) {
for(int j = i+1; j < test.size(); j++) {
// if values are equal AND current value is not already a part
// of duplicates string, then add it to duplicates string
if(test.get(i).equals(test.get(j)) && !duplicates.contains(test.get(j).toString())) {
duplicates += " " + test.get(j);
}
}
}
System.out.println(duplicates);
}
}
Purely for the purpose of finding duplicates, you can also create a HashSet and iteratively add the objects(ID's in your case)to the HashSet using .add( e) method.
Trick with HashSet is that it does not allow duplicate values and .add( e) method will return false if the same value is passed.
But be careful of what values(objects) you are giving to the .add() method, since it uses .equal() to compare whatever you're feeding it. It works if you pass Strings as a value.
But if you're giving it an Object make sure you override .equals() method in that object's class definition (because that's what .add() method will use to compare the objects)

Java LinkedList : remove from to to

I have a java.util.LinkedList containing data logically like
1 > 2 > 3 > 4 > 5 > null
and I want to remove elements from 2 to 4 and make the LinkedList like this
1 > 5 > null
In reality we should be able to achieve this in O(n) complexity considering you have to break chain at 2 and connect it to 5 in just a single operation.
In Java LinkedList I am not able to find any function which lets remove chains from linkedlist using from and to in a single O(n) operation.
It only provides me an option to remove the elements individually (Making each operation O(n)).
Is there anyway I can achieve this in just a single operation (Without writing my own List)?
One solution provided here solves the problem using single line of code, but not in single operation.
list.subList(1, 4).clear();
The question was more on algorithmic and performance. When I checked the performance, this is actually slower than removing the element one by one. I am guessing this solution do not actually remove an entire sublist in o(n) but doing that one by one for each element (each removal of O(n)). Also adding extra computation to take the sublist.
Average of 1000000 computations in ms:
Without sublist = 1414
With the provided sublist solution : = 1846**
The way to do it in one step is
list.subList(1, 4).clear();
as documented in the Javadoc for java.util.LinkedList#subList(int, int).
Having checked the source code, I see that this ends up removing the elements one at a time. subList is inherited from AbstractList. This implementation returns a List that simply calls removeRange on the backing list when you invoke clear on it. removeRange is also inherited from AbstractList and the implementation is
protected void removeRange(int fromIndex, int toIndex) {
ListIterator<E> it = listIterator(fromIndex);
for (int i=0, n=toIndex-fromIndex; i<n; i++) {
it.next();
it.remove();
}
}
As you can see, this removes the elements one at a time. listIterator is overridden in LinkedList, and it starts by finding the first node by following chains either by following links from the start of the list or the end (depending on whether fromIndex is in the first or second half of the list). This means that list.subList(i, j).clear() has time complexity
O(j - i + min(i, list.size() - i)).
Apart from the case when the you are better off starting from the end and removing the elements in reverse order, I am not convinced there is a solution that is noticeably faster. Testing the performance of code is not easy, and it is easy to be drawn to false conclusions.
There is no way of using the public API of the LinkedList class to remove all the elements in the middle in one go. This surprised me, as about the only reason for using a LinkedList rather than an ArrayList is that you are supposed to be able to insert and remove elements from the middle efficiently, so I thought this case worth optimising (especially as it's so easy to write).
If you absolutely need the O(1) performance that you should be able to get from a call such as
list.subList(1, list.size() - 1)).clear();
you will either have to write your own implementation or do something fragile and unwise with reflection like this:
public static void main(String[] args) {
LinkedList<Integer> list = new LinkedList<>();
for (int a = 0; a < 5; a++)
list.add(a);
removeRange_NEVER_DO_THIS(list, 2, 4);
System.out.println(list); // [0, 1, 4]
}
public static void removeRange_NEVER_DO_THIS(LinkedList<?> list, int from, int to) {
try {
Method node = LinkedList.class.getDeclaredMethod("node", int.class);
node.setAccessible(true);
Object low = node.invoke(list, from - 1);
Object hi = node.invoke(list, to);
Class<?> clazz = low.getClass();
Field nextNode = clazz.getDeclaredField("next");
Field prevNode = clazz.getDeclaredField("prev");
nextNode.setAccessible(true);
prevNode.setAccessible(true);
nextNode.set(low, hi);
prevNode.set(hi, low);
Field size = LinkedList.class.getDeclaredField("size");
size.setAccessible(true);
size.set(list, list.size() - to + from);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
To remove the middle elements in a single operation (method call) you could subclass java.util.LinkedList and then expose a call to List.removeRange(int, int):
list.removeRange(1, 4);
(Credit to the person who posted this answer then removed it. :)) However, even this method calls ListIterator.remove() n times.
I do not believe there is a way to remove n consecutive entries from a java.util.LinkedList without performing n operations under the hood.
In general removing n consecutive items from any linked list seems to require O(n) operations as one must traverse from the start index to the end index one item at a time - inherently - in order to find the next list entry in the modified list.

Insertion Sort LinkedList Java

I'm trying to write an Insertion sort for a LinkedList, I have got a working method but it is incredibly slow. Over an hour to add&sort 50,000 elements.
public void insert(Custom c)
{
int i = 0;
for (i = 0; i < list.size(); i++)
{
if(list.get(i).compareTo(c) > 0 )
{
list.add(i,c);
return;
}
}
list.add(c);
}
I know i could use Collections.Sort but for this assignment I am required to write my own LinkedList. I'm not asking for a full solution just some pointers.
First of all, insertion sort on a List is going to be slow (O(N^2)) ... no matter how you do it. But you appear to have implemented it as O(N^3).
Here is your code ... which will be called N times, to add each list element.
public void insert(Entry e)
{
int i = 0;
for (i = 0; i < list.size(); i++) // HERE #1
{
if(list.get(i).compareTo(e) > 0 ) // HERE #2
{
list.add(i,e); // HERE #3
return;
}
}
list.add(e); // HERE #4
}
At "HERE #1" we iterate up to M times where M is the current (partial) list length; i.e. O(M). This is inherent in an insertion sort. However, depending on how you implemented the size() method, you could have turned the iteration into a O(M^2) sequence of operations. (The LinkedList.size() method just returns the value of a size variable. No problem here. But if size() counted the elements ... )
At "HERE #2" we have a fetch and a comparison. The comparison (compareTo(...)) is cheap, but the get(i) operation on a linked list involves traversing the list from the beginning. That is an O(M) operation. And since you make the get(i) call O(M) times per insert call, this makes the call O(M^2) and the sort O(N^3).
At "HERE #3" the add(i,e) repeats the list traversal of the previous get(i) call. But that's not so bad because you only execute that add(i,e) call once per insert call. So the overall complexity is not affected.
At "HERE #4" the add() operation could be either O(1) or O(M) depending on how it is implemented. (For LinkedList.add() it is O(1) because the list data structure keeps a reference to the last node of the list.) Either way, overall complexity is not affected.
In short:
The code at #2 definitely make this an O(N^3) sort.
The code at #1 could also make it O(N^3) ... but not with the standard LinkedList class.
So what to do?
One approach is to recode the insert operation so that it traverses the list using the next and prev fields, etcetera directly. There should not be calls to any of the "higher level" list operations: size, get(i), add(e) or add(i, e).
However, if you are implementing this by extending or wrapping LinkedList, this is not an option. Those fields are private.
If you are extending or wrapping LinkedList, then the solution is to use the listIterator() method to give you a ListIterator, and use that for efficient traversal. The add operation on a ListIterator is O(1).
If (hypothetically) you were looking for the fastest way to sort a (large) LinkedList, then the solution is to use Collections.sort. Under the covers, that method copies the list contents to an array, does an O(NlogN) sort on the array, and reconstructs the list from the sorted array.
According to this response, you should use ListIterator.add() instead of List.add due to the better performance.
What about using a faster sorting algorithm?
Here is something known as QuickSort. Its way faster then normal sorts for larger data sets. QuickSort has a average case of O(nlogn) while insertion only has a average case of O(n^2). Big difference isn't it?
Sample implementation
QuickSort Class
import java.util.*;
public class QuickSort{
public static void swap(int A[] , int x, int y){
int temp = A[x];
A[x] = A[y];
A[y] = temp;
}
public static int[] QSort(int A[],int L, int U){
Random randomGenerator = new Random();
if ( L >= U){
return A;
}
if (L < U) {
/*
Partion the array around the pivot, which is eventually placed
in the correct location "p"
*/
int randomInt = L + randomGenerator.nextInt(U-L);
swap(A,L,randomInt);
int T = A[L];
int p = L;
for(int i= L+1; i<= U; i++){
if (T > A[i]){
p = p+1;
swap(A,p,i);
}
}
/* Recursively call the QSort(int u, int l) function, this deals with
the upper pointer first then the lower.
*/
swap(A,L,p);
QSort(A,p+1,U);
QSort(A,L, p-1);
}
return A;
}
}
Sample Main
import java.util.*;
public class Main{
public static void main(String [] args){
int[] intArray = {1,3,2,4,56,0,4,2,4,7,80,120,99,9,10,67,101,123,12,-1,-8};
System.out.printf("Original Array was:\n%s\n\n",Arrays.toString(intArray));
System.out.printf("Size of Array is: %d\n\n",intArray.length);
QuickSort.QSort(intArray, 0, intArray.length - 1);
int num = Integer.parseInt(args[0]);
System.out.println("The sorted array is:");
System.out.println(Arrays.toString(intArray));
}
}
The above example will sort an Int array but you can easily edit it to sort any object(for example Entry in your case). Ill let you figure that out yourself.
Good Luck
list.add(e) and list.get(e) will take o(n) each time they are called. You should avoid to use them when you travel your list.
Instead, if you have to write your own linked list you should keep track of the elements you are traveling. by replacing the operation i++ and get(i) by elem = elem.next or elem = elem.getnext(), (maybe something else depending on how you implemented your linked list). Then you add an element by doing:
elem.next.parent = e;
e.next = elem.next;
elem.next = e;
e.parent = elem;
here my example works for a doubly linked list and elem represent the element in the linked list you are currently comparing your object you want to add.

How to list elements maintaining order and reverse iteration?

What is the best list/set/array in Java that combines the following aspects:
maintain order of added elements
make if possible to both iterate forwards and backwards
of course good performance
I thought about a LinkedList, I then could insert elements by add(0, element) which would simulate a reverse order. Most of the time I will be using backwards iteration, so using this I can just iterate trough.
And if not, I can list.listIterator().hasPrevious().
But are there better approaches?
ArrayList will probably be your best bet. You can iterate through it in the following manner:
for (ListIterator it = list.listIterator(list.size()); it.hasPrevious();) {
Object value = it.previous();
}
A LinkedList will work but it will have more object creation overhead since you need to instantiate a Link for each element you store.
If you can get by index and wish to iterate over the collection then you can use a List and get(index) allow you to get the object in that place in the list. Arrays allow you to do this, you can just reference the index as normal, however if your array might grow then a Collection is going to be easier to use.
You can use List.size() and element through the object using a for loop rather than using an Iterator object, this will allow you to iterator over the list both forwards and backwards. For example:
List<AnObject> myList = new ArrayList<AnObject>;
// Add things to the list
for (int i = 0 ; i < myList.size; i++) {
AnObject myObject = myList.get(i);
}
for (int i = myList.size()-1 ; i <= 0 ; i--) {
AnObject myObject = myList.get(i);
}
Set is not applicable as a Set does not maintain ordering.

Any big difference between using contains or loop through a list?

Performance wise, is there really a big difference between using:
ArrayList.contains(o) vs foreach|iterator
LinkedList.contains(o) vs foreach|iterator
Of course, for the foreach|iterator loops, I'll have to explicitly compare the methods and return true or false accordingly.
The object I'm comparing is an object where equals() and hashcode() are both properly overridden.
EDIT: Don't need to know about containsValue after all, sorry about that. And yes, I'm stupid... I realized how stupid my question was about containsKey vs foreach, nevermind about that, I don't know what I was thinking. I basically want to know about the ones above (edited out the others).
EDITED:
With the new form of the question no longer including HashMap and TreeMap, my answer is entirely different. I now say no.
I'm sure that other people have answered this, but in both LinkedList and ArrayList, contains() just calls indexOf(), which iterates over the collection.
It's possible that there are tiny performance differences, both between LinkedList and ArrayList, and between contains and foreach, there aren't any big differences.
This makes no differency since contains(o) calls indexOf(o) which simply loops like this:
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
(Checked in ArrayList)
Without benchmarking, contains should be faster or the same in all cases.
For 1 and 2, it doesn't need to call the iterator methods. It can loop internally. Both ArrayList and LinkedList implement contains in terms of indexOf
ArrayList - indexOf is a C-style for loop on the backing array.
LinkedList - indexOf walks the linked list in a C-style for loop.
For 3 and 4, you have to distinguish between containsKey and containsValue.
3. HashMap, containsKey is O(1). It works by hashing the key, getting the associated bucket, then walking the linked list. containsValue is O(n) and works by simply checking every value in every bucket in a nested for loop.
4. TreeMap, containsKey is O(log n). It checks whether it's in range, then searches the red-black tree. containsValue, which is O(n), uses an in-order walk of the tree.
ArrayList.contains does
return indexOf(o) >= 0;
where
public int indexOf(Object o) {
if (o == null) {
for (int i = 0; i < size; i++)
if (elementData[i]==null)
return i;
} else {
for (int i = 0; i < size; i++)
if (o.equals(elementData[i]))
return i;
}
return -1;
}
It's similar for LinkedList, only it uses .next() to iterate through the elements, so not much difference there.
public int indexOf(Object o) {
int index = 0;
if (o==null) {
for (Entry e = header.next; e != header; e = e.next) {
if (e.element==null)
return index;
index++;
}
} else {
for (Entry e = header.next; e != header; e = e.next) {
if (o.equals(e.element))
return index;
index++;
}
}
return -1;
}
HashMap.containKey uses the hash of the key to fetch all keys with that hash (which is fast) and then uses equals only on those keys, so there's an improvement there; but containsValue() goes through the values with a for.
TreeMap.containsKey seem to do an informed search using a comparator to find the Key faster, so still better; but containsValue still seems to go through the entire three until it finds a value.
Overall I think you should use the methods, since they're easier to write than doing a loop every time :).
I think using contains is better because generally the library implementation is more efficient than manual implementation of the same. Check out if you can during object construction or afterwards pass a comparator method that you have written which takes care of your custom equals and hashcode implementation.
Thanks,
Krishna
Traversing the container with foreach/iterator is always O(n) time.
ArrayList/LinkedList search is O(n) as well.
HashMap.containsKey() is O(1) amortized time.
TreeMap.containsKey() is O(log n) time.
For both HashMap and TreeMap containsValue() is O(n), but there may be implementations optimized for containsValue() be as fast as containsKey().

Categories

Resources