java sorting with comparator and swap function - java

I need to sorting function with custom comparator and swap function. I can write one myself, but I'm wondering if someone else didn't already do it. Java runtime contains many specialized sorting function for sorting arrays of primitive types, objects etc., but none of them take swap function as an argument. Google search also didn't find anything useful.
public interface IntComparator
{
int compare(int a, int b);
}
public interface IntSwap
{
void swap(int a, int b);
}
public static void sort(IntComparator compFn, IntSwap swapFn, int off, int len);

Here is what I was looking for. It's based on java runtime algorithm for sorting integers. With proper implementation of Sortable interface, it can sort just about anything.
public class Sort {
public static void sort(Sortable sortable, int off, int len) {
// Insertion sort on smallest arrays
if (len < 7) {
for (int i = off; i < len + off; i++) {
for (int j = i; j > off && sortable.compare(j - 1, j) > 0; j--) {
sortable.swap(j, j - 1);
}
}
return;
}
// Choose a partition element, v
int m = off + (len >> 1); // Small arrays, middle element
if (len > 7) {
int l = off;
int n = off + len - 1;
if (len > 40) { // Big arrays, pseudomedian of 9
int s = len / 8;
l = med3(sortable, l, l + s, l + 2 * s);
m = med3(sortable, m - s, m, m + s);
n = med3(sortable, n - 2 * s, n - s, n);
}
m = med3(sortable, l, m, n); // Mid-size, med of 3
}
// Establish Invariant: v* (<v)* (>v)* v*
int a = off, b = a, c = off + len - 1, d = c;
while (true) {
while (b <= c && sortable.compare(b, m) <= 0) {
if (sortable.compare(b, m) == 0) {
sortable.swap(a, b);
m = a;
a++;
}
b++;
}
while (c >= b && sortable.compare(c, m) >= 0) {
if (sortable.compare(c, m) == 0) {
sortable.swap(c, d);
m = d;
d--;
}
c--;
}
if (b > c) {
break;
}
sortable.swap(b++, c--);
}
// Swap partition elements back to middle
int s, n = off + len;
s = Math.min(a - off, b - a);
vecswap(sortable, off, b - s, s);
s = Math.min(d - c, n - d - 1);
vecswap(sortable, b, n - s, s);
// Recursively sort non-partition-elements
if ((s = b - a) > 1) {
sort(sortable, off, s);
}
if ((s = d - c) > 1) {
sort(sortable, n - s, s);
}
}
private static int med3(Sortable sortable, int a, int b, int c) {
return sortable.compare(a, b) < 0 ? (sortable.compare(b, c) < 0 ? b : sortable.compare(a, c) < 0 ? c : a)
: sortable.compare(b, c) > 0 ? b : sortable.compare(a, c) > 0 ? c : a;
}
private static void vecswap(Sortable sortable, int a, int b, int n) {
for (int i = 0; i < n; i++, a++, b++) {
sortable.swap(a, b);
}
}
}

I need to swap indices in two arrays. I know that I could sort twodimensional array but that would increase required memory.
No. If I understand you correctly, it does not result in any overhead.
Remember that Java does not store arrays or objects directly in variables (or arrays!). It stores references. Even if each element referred to from an array is 40 bytes large, it will be stored as a reference in the array.
Thus, I suggest you go with the built in sorting mechanisms. They won't shuffle around lots of data, only the references.

Because sort() for an array of Object is stable, you may be able to get useful information inside a custom Comparator. This one counts swaps while sorting by String length.
import java.util.Arrays;
import java.util.Comparator;
/** #see http://stackoverflow.com/questions/4983746 */
public class SortTest {
private static class LengthComparator implements Comparator<String> {
private int count;
public int compare(String s1, String s2) {
int a = s1.length();
int b = s2.length();
if (a < b) {
return -1;
} else if (a > b) {
count++;
return 1;
} else {
return 0;
}
}
}
public static void main(String[] args) throws Exception {
String[] sa = {"One", "Two", "Three", "Four", "Five"};
System.out.println(Arrays.toString(sa));
LengthComparator byLength = new LengthComparator();
Arrays.sort(sa, byLength);
System.out.println(Arrays.toString(sa));
System.out.println(byLength.count);
}
}
Console:
[One, Two, Three, Four, Five]
[One, Two, Four, Five, Three]
2

Regarding to swap: Java passed argument by value, so methods swap(int a, int b) and swap(Object a, Object b) don't work as expected.

If you propose these interfaces, at least add some comments to what they should do. From the discussion I got that you want something like this:
/**
* A Sortable represents a indexed collection of comparable
* elements.
* It does not offer direct access to its elements, only
* comparison and swapping by indices.
*
* In the method specifications we are using this[i] to
* mean the
*/
public interface Sortable {
/**
* Compares two elements by their indices.
* #return -1 if this[first] < this[second],
* 0 if this[first] = this[second]
* 1 if this[first] > this[second]
* #throws IndexOutOfBoundsException if one
* or both indices are outside of the
* limits of this sequence.
*/
public int compare(int first, int second);
/**
* Swaps two elements by their indices.
* This is roughly equivalent to this sequence:
* <pre>
* temp = this[first];
* this[first] = this[second];
* this[second] = temp;
* </pre>
*/
public void swap(int first, int second);
}
interface Sorter {
/**
* sorts an interval of a sequence.
* #param sequence the sequence to be sorted.
* #param off the start of the interval to be sorted.
* #param the length of the interval to be sorted.
*/
public void sort(Sortable sequence, int off, int len);
}
And then you could have your sort algorithm implement Sorter, while your data structure implements Sortable.
Of course one could split the both functions of Sortable in an IndexComparator and IndexSwapper (not Int... like you named them), but they are both directly coupled to your data structure (consisting of your two arrays).

Related

Algorithms: Hybrid MergeSort and InsertionSort Execution Time

Good day SO community,
I am a CS student currently performing an experiment combining MergeSort and InsertionSort. It is understood that for a certain threshold, S, InsertionSort will have a quicker execution time than MergeSort. Hence, by merging both sorting algorithms, the total runtime will be optimized.
However, after running the experiment many times, using a sample size of 1000, and varying sizes of S, the results of the experiment does not give a definitive answer each time. Here is a picture of the better results obtained (Note that half of the time the result is not as definitive):
Now, trying the same algorithm code with a sample size of 3500:
Finally, trying the same algorithm code with a sample size of 500,000 (Note that the y-axis is in milliseconds:
Although logically, the Hybrid MergeSort will be faster when S<=10, as InsertionSort does not have recursive overhead time. However, the results of my mini experiment says otherwise.
Currently, these are the Time Complexities taught to me:
MergeSort: O(n log n)
InsertionSort:
Best Case: θ(n)
Worst Case: θ(n^2)
Finally, I have found an online source: https://cs.stackexchange.com/questions/68179/combining-merge-sort-and-insertion-sort that states that:
Hybrid MergeInsertionSort:
Best Case: θ(n + n log (n/x))
Worst Case: θ(nx + n log (n/x))
I would like to ask if there are results in the CS community that shows definitive proof that a Hybrid MergeSort algorithm will work better than a normal MergeSort algorithm below a certain threshold, S, and if so, why?
Thank you so much SO community, it might be a trivial question, but it really will clarify many questions that I currently have regarding Time Complexities and stuff :)
Note: I am using Java for the coding of the algorithm, and runtime could be affected by the way java stores data in memory..
Code in Java:
public static int mergeSort2(int n, int m, int s, int[] arr){
int mid = (n+m)/2, right=0, left=0;
if(m-n<=s)
return insertSort(arr,n,m);
else
{
right = mergeSort2(n, mid,s, arr);
left = mergeSort2(mid+1,m,s, arr);
return right+left+merge(n,m,s,arr);
}
}
public static int insertSort(int[] arr, int n, int m){
int temp, comp=0;
for(int i=n+1; i<= m; i++){
for(int j=i; j>n; j--){
comp++;
comparison2++;
if(arr[j]<arr[j-1]){
temp = arr[j];
arr[j] = arr[j-1];
arr[j-1] = temp;
}
else
break;
}
}
return comp;
}
public static void shiftArr(int start, int m, int[] arr){
for(int i=m; i>start; i--)
arr[i] = arr[i-1];
}
public static int merge(int n, int m, int s, int[] arr){
int comp=0;
if(m-n<=s)
return 0;
int mid = (n+m)/2;
int temp, i=n, j=mid+1;
while(i<=mid && j<=m)
{
comp++;
comparison2++;
if(arr[i] >= arr[j])
{
if(i==mid++&&j==m && (arr[i]==arr[j]))
break;
temp = arr[j];
shiftArr(i,j++,arr);
arr[i] = temp;
if(arr[i+1]==arr[i]){
i++;
}
}
i++;
}
return comp;
}
The example code isn't a conventional merge sort. The merge function is shifting an array instead of merging runs between the original array and a temporary working array and back.
I tested top down and bottom up merge sorts and both take about 42 ms == 0.042 seconds to sort 500,000 32 bit integers, versus the apparent results in the graph which are 1000 times slower at about 42 seconds instead of 42 ms. I also tested with 10,000,000 integers and it takes a bit over 1 second to sort.
In the past, using C++, I compared a bottom up merge sort with a hybrid bottom up merge / insertion sort, and for 16 million (2^24 == 16,777,216) 32 bit integers, the hybrid sort was about 8% faster with S == 16. S == 64 was slightly slower than S == 16. Visual Studio std::stable_sort is a variation of bottom up merge sort (the temp array is 1/2 the size of the original array) and insertion sort, and uses S == 32.
For small arrays, insertion sort is quicker than merge sort, a combination of cache locality and fewer instructions needed to sort a small array with insertion sort. For pseudo random data and S == 16 to 64, insertion sort was about twice as fast as merge sort.
The relative gain diminishes as the array size increases. Considering the effect on bottom up merge sort, with S == 16, only 4 merge passes are optimized. In my test case with 2^24 == 16,777,216 elements, that's 4/24 = 1/6 ~= 16.7% of the number of passes, resulting in about an 8% improvement (so the insertion sort is about twice as fast as merge sort for those 4 passes). The total times were about 1.52 seconds for the merge only sort, and about 1.40 seconds for the hybrid sort, a 0.12 second gain on a process that only takes 1.52 seconds. For a top down merge sort, with S == 16, the 4 deepest levels of recursion would be optimized.
Update - Example java code for an hybrid in place merge sort / insertion sort with O(n log(n)) time complexity. (Note - auxiliary storage is still consumed on the stack due to recursion.) The in place part is accomplished during merge steps by swapping the data in the area merged into with the data in the area merged from. This is not a stable sort (the order of equal elements is not preserved, due to the swapping during merge steps). Sorting 500,000 integers takes about 1/8th of a second, so I increased this to 16 million (2^24 == 16777216) integers, which takes a bit over 4 seconds. Without the insertion sort, the sort takes about 4.524 seconds, and with the insertion sort with S == 64, the sort takes about 4.150 seconds, about 8.8% gain. With essentially the same code in C, the improvement was less: from 2.88 seconds to 2.75 seconds, about 4.5% gain.
package msortih;
import java.util.Random;
public class msortih {
static final int S = 64; // use insertion sort if size <= S
static void swap(int[] a, int i, int j) {
int tmp = a[i]; a[i] = a[j]; a[j] = tmp;
}
// a[w:] = merged a[i:m]+a[j:n]
// a[i:] = reordered a[w:]
static void wmerge(int[] a, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(a, w++, a[i] < a[j] ? i++ : j++);
while (i < m)
swap(a, w++, i++);
while (j < n)
swap(a, w++, j++);
}
// a[w:] = sorted a[b:e]
// a[b:e] = reordered a[w:]
static void wsort(int[] a, int b, int e, int w) {
int m;
if (e - b > 1) {
m = b + (e - b) / 2;
imsort(a, b, m);
imsort(a, m, e);
wmerge(a, b, m, m, e, w);
}
else
while (b < e)
swap(a, b++, w++);
}
// inplace merge sort a[b:e]
static void imsort(int[] a, int b, int e) {
int m, n, w, x;
int t;
// if <= S elements, use insertion sort
if (e - b <= S){
for(n = b+1; n < e; n++){
t = a[n];
m = n-1;
while(m >= b && a[m] > t){
a[m+1] = a[m];
m--;}
a[m+1] = t;}
return;
}
if (e - b > 1) {
// split a[b:e]
m = b + (e - b) / 2;
w = b + e - m;
// wsort -> a[w:e] = sorted a[b:m]
// a[b:m] = reordered a[w:e]
wsort(a, b, m, w);
while (w - b > 2) {
// split a[b:w], w = new mid point
n = w;
w = b + (n - b + 1) / 2;
x = b + n - w;
// wsort -> a[b:x] = sorted a[w:n]
// a[w:n] = reordered a[b:x]
wsort(a, w, n, b);
// wmerge -> a[w:e] = merged a[b:x]+a[n:e]
// a[b:x] = reordered a[w:n]
wmerge(a, b, x, n, e, w);
}
// insert a[b:w] into a[b:e] using left shift
for (n = w; n > b; --n) {
t = a[n-1];
for (m = n; m < e && a[m] < t; ++m)
a[m-1] = a[m];
a[m-1] = t;
}
}
}
public static void main(String[] args) {
int[] a = new int[16*1024*1024];
Random r = new Random(0);
for(int i = 0; i < a.length; i++)
a[i] = r.nextInt();
long bgn, end;
bgn = System.currentTimeMillis();
imsort(a, 0, a.length);
end = System.currentTimeMillis();
for(int i = 1; i < a.length; i++){
if(a[i-1] > a[i]){
System.out.println("failed");
break;
}
}
System.out.println("milliseconds " + (end-bgn));
}
}

Codility PermMissingElem gives strange results

The task is the following:
A zero-indexed array A consisting of N different integers is given. The array contains integers in the range [1..(N + 1)], which means that exactly one element is missing.
Your goal is to find that missing element.
Write a function:
class Solution { public int solution(int[] A); }
that, given a zero-indexed array A, returns the value of the missing element.
For example, given array A such that:
A[0] = 2
A[1] = 3
A[2] = 1
A[3] = 5
the function should return 4, as it is the missing element.
Assume that:
N is an integer within the range [0..100,000];
the elements of A are all distinct;
each element of array A is an integer within the range [1..(N + 1)].
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(1), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
Now, my solution is the following:
// you can also use imports, for example:
// import java.util.*;
// you can use System.out.println for debugging purposes, e.g.
// System.out.println("this is a debug message");
class Solution {
public int solution(int[] A) {
long nPlusOneSum = (A.length + 2) * (A.length + 1) / 2;
long arraySum = 0;
for (int element : A)
arraySum += element;
return (int)(nPlusOneSum - arraySum);
}
}
the problem is that i have the following results:
I don't quite understand why I'm having those results in large_range and large2 tests.
I made a sort of a test myself which should simulate large array:
import org.junit.Before;
import org.junit.Test;
public class SomeOtherTest {
int[] maxArray;
int N = 100000;
#Before
public void setUp() {
maxArray = new int[N];
for (int i = 0; i < maxArray.length; i ++) {
maxArray[i] = i + 1;
}
maxArray[0] = maxArray.length + 1;
}
#Test
public void test() {
System.out.println(solution(maxArray));
}
public int solution(int[] A) {
long nPlusOneSum = (A.length + 2) * (A.length + 1) / 2;
long arraySum = 0;
for (int element : A)
arraySum += element;
return (int)(nPlusOneSum - arraySum);
}
}
but it provides me the correct answer which is 1 (used jdk 1.8 something as in codility)
A link to the test results: https://codility.com/demo/results/demoWAS9FA-5FA/
EDIT:
this solution:
class Solution {
public int solution(int[] A) {
long nPlusOneSum = (A.length + 2) * (A.length + 1) / 2;
for (int element : A)
nPlusOneSum -= element;
return (int)nPlusOneSum;
}
}
gives the same result: https://codility.com/demo/results/demoWAS9FA-5FA/
EDIT2
as soon as I introduced temporary variable to hold array length, test passed
code:
class Solution {
public int solution(int[] A) {
long numberOfElementsPlusOne = A.length + 1;
long nPlusOneSum = numberOfElementsPlusOne * (numberOfElementsPlusOne + 1) / 2;
for (int element : A)
nPlusOneSum -= element;
return (int)nPlusOneSum;
}
}
result: https://codility.com/demo/results/demoE82PUM-JCA/
EDIT3
the weird thing is that test still produces correct results, even despite that during it evaluation, overflow occurs.
nPlusOneSum gets overflowed and gets value 705182705 instead of 5000150001.
arraySum doesn't get overflowed and gets value of 5000150000
Then at return statement nPlusOneSum - arraySum is evaluated to -4294967295 which for some reason is then by conversion to (int) gets the correct value 1.
What happens exactly when operation overflows it's type in java?
The trick is that A.length is integer. You should convert it to long before using.
public int solution(int[] A) {
long sum = 0;
for (int element: A) {
sum += element;
}
long expectedSum = (((long) A.length + 1) * ((long) A.length + 2)) / 2;
return (int) (expectedSum - sum);
}
According to java lang spec:
http://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.17
The type of a multiplicative expression is the promoted type of its
operands
So resulting type of multiplication of two int is also an int which silently overflows for values around 100.000, the solution would be to change type of operands to long.
EDIT
the weird thing is that test still produces correct results, even despite that during it evaluation, overflow occurs.
There is small catch over here and Its:
Suppose length of your arrays is 100,000. You are trying to find the sum using the formula (N * (N+1))/2, that mean (100,000 * 100,101)/2. So here it multiplies both the numbers first which already exceeded the Max data type value. Hence you have seen the error.
public int solution(int[] arr) {
int realLen = arr.length + 1;
long realSum = 0;
if(realLen%2 == 0) {
realSum = (realLen/2) * (realLen + 1);
} else {
realSum = realLen * ((realLen + 1)/2);
}
for(int i = 0; i < arr.length; i++) {
realSum = realSum - arr[i];
}
return (int)realSum;
}

How to calculate the median of an array?

I'm trying to calculate the total, mean and median of an array thats populated by input received by a textfield. I've managed to work out the total and the mean, I just can't get the median to work. I think the array needs to be sorted before I can do this, but I'm not sure how to do this. Is this the problem, or is there another one that I didn't find? Here is my code:
import java.applet.Applet;
import java.awt.Graphics;
import java.awt.*;
import java.awt.event.*;
public class whileloopq extends Applet implements ActionListener
{
Label label;
TextField input;
int num;
int index;
int[] numArray = new int[20];
int sum;
int total;
double avg;
int median;
public void init ()
{
label = new Label("Enter numbers");
input = new TextField(5);
add(label);
add(input);
input.addActionListener(this);
index = 0;
}
public void actionPerformed (ActionEvent ev)
{
int num = Integer.parseInt(input.getText());
numArray[index] = num;
index++;
if (index == 20)
input.setEnabled(false);
input.setText("");
sum = 0;
for (int i = 0; i < numArray.length; i++)
{
sum += numArray[i];
}
total = sum;
avg = total / index;
median = numArray[numArray.length/2];
repaint();
}
public void paint (Graphics graf)
{
graf.drawString("Total = " + Integer.toString(total), 25, 85);
graf.drawString("Average = " + Double.toString(avg), 25, 100);
graf.drawString("Median = " + Integer.toString(median), 25, 115);
}
}
The Arrays class in Java has a static sort function, which you can invoke with Arrays.sort(numArray).
Arrays.sort(numArray);
double median;
if (numArray.length % 2 == 0)
median = ((double)numArray[numArray.length/2] + (double)numArray[numArray.length/2 - 1])/2;
else
median = (double) numArray[numArray.length/2];
Sorting the array is unnecessary and inefficient. There's a variation of the QuickSort (QuickSelect) algorithm which has an average run time of O(n); if you sort first, you're down to O(n log n). It actually finds the nth smallest item in a list; for a median, you just use n = half the list length. Let's call it quickNth (list, n).
The concept is that to find the nth smallest, choose a 'pivot' value. (Exactly how you choose it isn't critical; if you know the data will be thoroughly random, you can take the first item on the list.)
Split the original list into three smaller lists:
One with values smaller than the pivot.
One with values equal to the pivot.
And one with values greater than the pivot.
You then have three cases:
The "smaller" list has >= n items. In that case, you know that the nth smallest is in that list. Return quickNth(smaller, n).
The smaller list has < n items, but the sum of the lengths of the smaller and equal lists have >= n items. In this case, the nth is equal to any item in the "equal" list; you're done.
n is greater than the sum of the lengths of the smaller and equal lists. In that case, you can essentially skip over those two, and adjust n accordingly. Return quickNth(greater, n - length(smaller) - length(equal)).
Done.
If you're not sure that the data is thoroughly random, you need to be more sophisticated about choosing the pivot. Taking the median of the first value in the list, the last value in the list, and the one midway between the two works pretty well.
If you're very unlucky with your choice of pivots, and you always choose the smallest or highest value as your pivot, this takes O(n^2) time; that's bad. But, it's also very unlikely if you choose your pivot with a decent algorithm.
Sample code:
import java.util.*;
public class Utility {
/****************
* #param coll an ArrayList of Comparable objects
* #return the median of coll
*****************/
public static <T extends Number> double median(ArrayList<T> coll, Comparator<T> comp) {
double result;
int n = coll.size()/2;
if (coll.size() % 2 == 0) // even number of items; find the middle two and average them
result = (nth(coll, n-1, comp).doubleValue() + nth(coll, n, comp).doubleValue()) / 2.0;
else // odd number of items; return the one in the middle
result = nth(coll, n, comp).doubleValue();
return result;
} // median(coll)
/*****************
* #param coll a collection of Comparable objects
* #param n the position of the desired object, using the ordering defined on the list elements
* #return the nth smallest object
*******************/
public static <T> T nth(ArrayList<T> coll, int n, Comparator<T> comp) {
T result, pivot;
ArrayList<T> underPivot = new ArrayList<>(), overPivot = new ArrayList<>(), equalPivot = new ArrayList<>();
// choosing a pivot is a whole topic in itself.
// this implementation uses the simple strategy of grabbing something from the middle of the ArrayList.
pivot = coll.get(n/2);
// split coll into 3 lists based on comparison with the pivot
for (T obj : coll) {
int order = comp.compare(obj, pivot);
if (order < 0) // obj < pivot
underPivot.add(obj);
else if (order > 0) // obj > pivot
overPivot.add(obj);
else // obj = pivot
equalPivot.add(obj);
} // for each obj in coll
// recurse on the appropriate list
if (n < underPivot.size())
result = nth(underPivot, n, comp);
else if (n < underPivot.size() + equalPivot.size()) // equal to pivot; just return it
result = pivot;
else // everything in underPivot and equalPivot is too small. Adjust n accordingly in the recursion.
result = nth(overPivot, n - underPivot.size() - equalPivot.size(), comp);
return result;
} // nth(coll, n)
public static void main (String[] args) {
Comparator<Integer> comp = Comparator.naturalOrder();
Random rnd = new Random();
for (int size = 1; size <= 10; size++) {
ArrayList<Integer> coll = new ArrayList<>(size);
for (int i = 0; i < size; i++)
coll.add(rnd.nextInt(100));
System.out.println("Median of " + coll.toString() + " is " + median(coll, comp));
} // for a range of possible input sizes
} // main(args)
} // Utility
If you want to use any external library here is Apache commons math library using you can calculate the Median.
For more methods and use take look at the API documentation
import org.apache.commons.math3.*;
.....
......
........
//calculate median
public double getMedian(double[] values){
Median median = new Median();
double medianValue = median.evaluate(values);
return medianValue;
}
.......
For more on evaluate method AbstractUnivariateStatistic#evaluate
Update
Calculate in program
Generally, median is calculated using the following two formulas given here
If n is odd then Median (M) = value of ((n + 1)/2)th item term.
If n is even then Median (M) = value of [((n)/2)th item term + ((n)/2 + 1)th item term ]/2
In your program you have numArray, first you need to sort array using Arrays#sort
Arrays.sort(numArray);
int middle = numArray.length/2;
int medianValue = 0; //declare variable
if (numArray.length%2 == 1)
medianValue = numArray[middle];
else
medianValue = (numArray[middle-1] + numArray[middle]) / 2;
Arrays.sort(numArray);
return (numArray[size/2] + numArray[(size-1)/2]) / 2;
Arrays.sort(numArray);
int middle = ((numArray.length) / 2);
if(numArray.length % 2 == 0){
int medianA = numArray[middle];
int medianB = numArray[middle-1];
median = (medianA + medianB) / 2;
} else{
median = numArray[middle + 1];
}
EDIT: I initially had medianB setting to middle+1 in the even length arrays, this was wrong due to arrays starting count at 0. I have updated it to use middle-1 which is correct and should work properly for an array with an even length.
You can find good explanation at https://www.youtube.com/watch?time_continue=23&v=VmogG01IjYc
The idea it to use 2 Heaps viz one max heap and mean heap.
class Heap {
private Queue<Integer> low = new PriorityQueue<>(Comparator.reverseOrder());
private Queue<Integer> high = new PriorityQueue<>();
public void add(int number) {
Queue<Integer> target = low.size() <= high.size() ? low : high;
target.add(number);
balance();
}
private void balance() {
while(!low.isEmpty() && !high.isEmpty() && low.peek() > high.peek()) {
Integer lowHead= low.poll();
Integer highHead = high.poll();
low.add(highHead);
high.add(lowHead);
}
}
public double median() {
if(low.isEmpty() && high.isEmpty()) {
throw new IllegalStateException("Heap is empty");
} else {
return low.size() == high.size() ? (low.peek() + high.peek()) / 2.0 : low.peek();
}
}
}
Try sorting the array first. Then after it's sorted, if the array has an even amount of elements the mean of the middle two is the median, if it has a odd number, the middle element is the median.
Use Arrays.sort and then take the middle element (in case the number n of elements in the array is odd) or take the average of the two middle elements (in case n is even).
public static long median(long[] l)
{
Arrays.sort(l);
int middle = l.length / 2;
if (l.length % 2 == 0)
{
long left = l[middle - 1];
long right = l[middle];
return (left + right) / 2;
}
else
{
return l[middle];
}
}
Here are some examples:
#Test
public void evenTest()
{
long[] l = {
5, 6, 1, 3, 2
};
Assert.assertEquals((3 + 4) / 2, median(l));
}
#Test
public oddTest()
{
long[] l = {
5, 1, 3, 2, 4
};
Assert.assertEquals(3, median(l));
}
And in case your input is a Collection, you might use Google Guava to do something like this:
public static long median(Collection<Long> numbers)
{
return median(Longs.toArray(numbers)); // requires import com.google.common.primitives.Longs;
}
I was looking at the same statistics problems. The approach you are thinking it is good and it will work. (Answer to the sorting has been given)
But in case you are interested in algorithm performance, I think there are a couple of algorithms that have better performance than just sorting the array, one (QuickSelect) is indicated by #bruce-feist's answer and is very well explained.
[Java implementation: https://discuss.leetcode.com/topic/14611/java-quick-select ]
But there is a variation of this algorithm named median of medians, you can find a good explanation on this link:
http://austinrochford.com/posts/2013-10-28-median-of-medians.html
Java implementation of this:
- https://stackoverflow.com/a/27719796/957979
I faced a similar problem yesterday.
I wrote a method with Java generics in order to calculate the median value of every collection of Numbers; you can apply my method to collections of Doubles, Integers, Floats and returns a double. Please consider that my method creates another collection in order to not alter the original one.
I provide also a test, have fun. ;-)
public static <T extends Number & Comparable<T>> double median(Collection<T> numbers){
if(numbers.isEmpty()){
throw new IllegalArgumentException("Cannot compute median on empty collection of numbers");
}
List<T> numbersList = new ArrayList<>(numbers);
Collections.sort(numbersList);
int middle = numbersList.size()/2;
if(numbersList.size() % 2 == 0){
return 0.5 * (numbersList.get(middle).doubleValue() + numbersList.get(middle-1).doubleValue());
} else {
return numbersList.get(middle).doubleValue();
}
}
JUnit test code snippet:
/**
* Test of median method, of class Utils.
*/
#Test
public void testMedian() {
System.out.println("median");
Double expResult = 3.0;
Double result = Utils.median(Arrays.asList(3.0,2.0,1.0,9.0,13.0));
assertEquals(expResult, result);
expResult = 3.5;
result = Utils.median(Arrays.asList(3.0,2.0,1.0,9.0,4.0,13.0));
assertEquals(expResult, result);
}
Usage example (consider the class name is Utils):
List<Integer> intValues = ... //omitted init
Set<Float> floatValues = ... //omitted init
.....
double intListMedian = Utils.median(intValues);
double floatSetMedian = Utils.median(floatValues);
Note: my method works on collections, you can convert arrays of numbers to list of numbers as pointed here
And nobody paying attention when list contains only one element (list.size == 1). All your answers will crash with index out of bound exception, because integer division returns zero (1 / 2 = 0). Correct answer (in Kotlin):
MEDIAN("MEDIAN") {
override fun calculate(values: List<BigDecimal>): BigDecimal? {
if (values.size == 1) {
return values.first()
}
if (values.size > 1) {
val valuesSorted = values.sorted()
val mid = valuesSorted.size / 2
return if (valuesSorted.size % 2 != 0) {
valuesSorted[mid]
} else {
AVERAGE.calculate(listOf(valuesSorted[mid - 1], valuesSorted[mid]))
}
}
return null
}
},
As #Bruce-Feist mentions, for a large number of elements, I'd avoid any solution involving sort if performance is something you are concerned about. A different approach than those suggested in the other answers is Hoare's algorithm to find the k-th smallest of element of n items. This algorithm runs in O(n).
public int findKthSmallest(int[] array, int k)
{
if (array.length < 10)
{
Arrays.sort(array);
return array[k];
}
int start = 0;
int end = array.length - 1;
int x, temp;
int i, j;
while (start < end)
{
x = array[k];
i = start;
j = end;
do
{
while (array[i] < x)
i++;
while (x < array[j])
j--;
if (i <= j)
{
temp = array[i];
array[i] = array[j];
array[j] = temp;
i++;
j--;
}
} while (i <= j);
if (j < k)
start = i;
if (k < i)
end = j;
}
return array[k];
}
And to find the median:
public int median(int[] array)
{
int length = array.length;
if ((length & 1) == 0) // even
return (findKthSmallest(array, array.length / 2) + findKthSmallest(array, array.length / 2 + 1)) / 2;
else // odd
return findKthSmallest(array, array.length / 2);
}
public static int median(int[] arr) {
int median = 0;
java.util.Arrays.sort(arr);
for (int i=0;i<arr.length;i++) {
if (arr.length % 2 == 1) {
median = Math.round(arr[arr.length/2]);
} else {
median = (arr[(arr.length/2)] + arr[(arr.length/2)-1])/2;
}
}
return median;
}
Check out the Arrays.sort methods:
http://docs.oracle.com/javase/6/docs/api/java/util/Arrays.html
You should also really abstract finding the median into its own method, and just return the value to the calling method. This will make testing your code much easier.
public int[] data={31, 29, 47, 48, 23, 30, 21
, 40, 23, 39, 47, 47, 42, 44, 23, 26, 44, 32, 20, 40};
public double median()
{
Arrays.sort(this.data);
double result=0;
int size=this.data.length;
if(size%2==1)
{
result=data[((size-1)/2)+1];
System.out.println(" uneven size : "+result);
}
else
{
int middle_pair_first_index =(size-1)/2;
result=(data[middle_pair_first_index+1]+data[middle_pair_first_index])/2;
System.out.println(" Even size : "+result);
}
return result;
}
package arrays;
public class Arraymidleelement {
static public double middleArrayElement(int [] arr)
{
double mid;
if(arr.length%2==0)
{
mid=((double)arr[arr.length/2]+(double)arr[arr.length/2-1])/2;
return mid;
}
return arr[arr.length/2];
}
public static void main(String[] args) {
int arr[]= {1,2,3,4,5,6};
System.out.println( middleArrayElement(arr));
}
}

Java: simplest integer hash

I need a quick hash function for integers:
int hash(int n) { return ...; }
Is there something that exists already in Java?
The minimal properties that I need are:
hash(n) & 1 does not appear periodic when used with a bunch of consecutive values of n.
hash(n) & 1 is approximately equally likely to be 0 or 1.
HashMap, as well as Guava's hash-based utilities, use the following method on hashCode() results to improve bit distributions and defend against weaker hash functions:
/*
* This method was written by Doug Lea with assistance from members of JCP
* JSR-166 Expert Group and released to the public domain, as explained at
* http://creativecommons.org/licenses/publicdomain
*
* As of 2010/06/11, this method is identical to the (package private) hash
* method in OpenJDK 7's java.util.HashMap class.
*/
static int smear(int hashCode) {
hashCode ^= (hashCode >>> 20) ^ (hashCode >>> 12);
return hashCode ^ (hashCode >>> 7) ^ (hashCode >>> 4);
}
So, I read this question, thought hmm this is a pretty math-y question, it's probably out of my league. Then, I ended up spending so much time thinking about it that I actually believe I've got the answer: No function can satisfy the criteria that f(n) & 1 is non-periodic for consecutive values of n.
Hopefully someone will tell me how ridiculous my reasoning is, but until then I believe it's correct.
Here goes: Any binary integer n can be represented as either 1...0 or 1...1, and only the least significant bit of that bitmap will affect the result of n & 1. Further, the next consecutive integer n + 1 will always contain the opposite least significant bit. So, clearly any series of consecutive integers will exhibit a period of 2 when passed to the function n & 1. So then, is there any function f(n) that will sufficiently distribute the series of consecutive integers such that periodicity is eliminated?
Any function f(n) = n + c fails, as c must end in either 0 or 1, so the LSB will either flip or stay the same depending on the constant chosen.
The above also eliminates subtraction for all trivial cases, but I have not taken the time to analyze the carry behavior yet, so there may be a crack here.
Any function f(n) = c*n fails, as the LSB will always be 0 if c ends in 0 and always be equal to the LSB of n if c ends in 1.
Any function f(n) = n^c fails, by similar reasoning. A power function would always have the same LSB as n.
Any function f(n) = c^n fails, for the same reason.
Division and modulus were a bit less intuitive to me, but basically, the LSB of either option ends up being determined by a subtraction (already ruled out). The modulus will also obviously have a period equal to the divisor.
Unfortunately, I don't have the rigor necessary to prove this, but I believe any combination of the above operations will ultimately fail as well. This leads me to believe that we can rule out any transcendental function, because these are implemented with polynomials (Taylor series? not a terminology guy).
Finally, I held out hope on the train ride home that counting the bits would work; however, this is actually a periodic function as well. The way I thought about it was, imagine taking the sum of the digits of any decimal number. That sum obviously would run from 0 through 9, then drop to 1, run from 1 to 10, then drop to 2... It has a period, the range just keeps shifting higher the higher we count. We can actually do the same thing for the sum of the binary digits, in which case we get something like: 0,1,1,2,2,....5,5,6,6,7,7,8,8....
Did I leave anything out?
TL;DR I don't think your question has an answer.
[SO decided to convert my "trivial answer" to comment. Trying to add little text to it to see if it can be fooled]
Unless you need the ranger of hashing function to be wider..
The NumberOfSetBits function seems to vary quite a lot more then the hashCode, and as such seems more appropriate for your needs. Turns out there is already a fairly efficient algorithm on SO.
See Best algorithm to count the number of set bits in a 32-bit integer.
I did some experimentation (see test program below); computation of 2^n in Galois fields, and floor(A*sin(n)) both did very well to produce a sequence of "random" bits. I tried multiplicative congruential random number generators and some algebra and CRC (which is analogous of k*n in Galois fields), none of which did well.
The floor(A*sin(n)) approach is the simplest and quickest; the 2^n calculation in GF32 takes approx 64 multiplies and 1024 XORs worstcase, but the periodicity of output bits is extremely well-understood in the context of linear-feedback shift registers.
package com.example.math;
public class QuickHash {
interface Hasher
{
public int hash(int n);
}
static class MultiplicativeHasher1 implements Hasher
{
/* multiplicative random number generator
* from L'Ecuyer is x[n+1] = 1223106847 x[n] mod (2^32-5)
* http://dimsboiv.uqac.ca/Cours/C2012/8INF802_Hiv12/ref/paper/RNG/TableLecuyer.pdf
*/
final static long a = 1223106847L;
final static long m = (1L << 32)-5;
/*
* iterative step towards computing mod m
* (j*(2^32)+k) mod (2^32-5)
* = (j*(2^32-5)+j*5+k) mod (2^32-5)
* = (j*5+k) mod (2^32-5)
* repeat twice to get a number between 0 and 2^31+24
*/
private long quickmod(long x)
{
long j = x >>> 32;
long k = x & 0xffffffffL;
return j*5+k;
}
// treat n as unsigned before computation
#Override public int hash(int n) {
long h = a*(n&0xffffffffL);
long h2 = quickmod(quickmod(h));
return (int) (h2 >= m ? (h2-m) : h2);
}
#Override public String toString() { return getClass().getSimpleName(); }
}
/**
* computes (2^n) mod P where P is the polynomial in GF2
* with coefficients 2^(k+1) represented by the bits k=31:0 in "poly";
* coefficient 2^0 is always 1
*/
static class GF32Hasher implements Hasher
{
static final public GF32Hasher CRC32 = new GF32Hasher(0x82608EDB, 32);
final private int poly;
final private int ofs;
public GF32Hasher(int poly, int ofs) {
this.ofs = ofs;
this.poly = poly;
}
static private long uint(int x) { return x&0xffffffffL; }
// modulo GF2 via repeated subtraction
int mod(long n) {
long rem = n;
long q = uint(this.poly);
q = (q << 32) | (1L << 31);
long bitmask = 1L << 63;
for (int i = 0; i < 32; ++i, bitmask >>>= 1, q >>>= 1)
{
if ((rem & bitmask) != 0)
rem ^= q;
}
return (int) rem;
}
int mul(int x, int y)
{
return mod(uint(x)*uint(y));
}
int pow2(int n) {
// compute 2^n mod P using repeated squaring
int y = 1;
int x = 2;
while (n > 0)
{
if ((n&1) != 0)
y = mul(y,x);
x = mul(x,x);
n = n >>> 1;
}
return y;
}
#Override public int hash(int n) {
return pow2(n+this.ofs);
}
#Override public String toString() {
return String.format("GF32[%08x, ofs=%d]", this.poly, this.ofs);
}
}
static class QuickHasher implements Hasher
{
#Override public int hash(int n) {
return (int) ((131111L*n)^n^(1973*n)%7919);
}
#Override public String toString() { return getClass().getSimpleName(); }
}
// adapted from http://www.w3.org/TR/PNG-CRCAppendix.html
static class CRC32TableHasher implements Hasher
{
final private int table[];
static final private int polyval = 0xedb88320;
public CRC32TableHasher()
{
this.table = make_table();
}
/* Make the table for a fast CRC. */
static public int[] make_table()
{
int[] table = new int[256];
int c;
int n, k;
for (n = 0; n < 256; n++) {
c = n;
for (k = 0; k < 8; k++) {
if ((c & 1) != 0)
c = polyval ^ (c >>> 1);
else
c = c >>> 1;
}
table[n] = (int) c;
}
return table;
}
public int iterate(int state, int i)
{
return this.table[(state ^ i) & 0xff] ^ (state >>> 8);
}
#Override public int hash(int n) {
int h = -1;
h = iterate(h, n >>> 24);
h = iterate(h, n >>> 16);
h = iterate(h, n >>> 8);
h = iterate(h, n);
return h ^ -1;
}
#Override public String toString() { return getClass().getSimpleName(); }
}
static class TrigHasher implements Hasher
{
#Override public String toString() { return getClass().getSimpleName(); }
#Override public int hash(int n) {
double s = Math.sin(n);
return (int) Math.floor((1<<31)*s);
}
}
private static void test(Hasher hasher) {
System.out.println(hasher+":");
for (int i = 0; i < 64; ++i)
{
int h = hasher.hash(i);
System.out.println(String.format("%08x -> %08x %%2 = %d",
i,h,(h&1)));
}
for (int i = 0; i < 256; ++i)
{
System.out.print(hasher.hash(i) & 1);
}
System.out.println();
analyzeBits(hasher);
}
private static void analyzeBits(Hasher hasher) {
final int N = 65536;
final int maxrunlength=32;
int[][] runs = {new int[maxrunlength], new int[maxrunlength]};
int[] count = new int[2];
int prev = -1;
System.out.println("Run length test of "+N+" bits");
for (int i = 0; i < maxrunlength; ++i)
{
runs[0][i] = 0;
runs[1][i] = 0;
}
int runlength_minus1 = 0;
for (int i = 0; i < N; ++i)
{
int b = hasher.hash(i) & 0x1;
count[b]++;
if (b == prev)
++runlength_minus1;
else if (i > 0)
{
++runs[prev][runlength_minus1];
runlength_minus1 = 0;
}
prev = b;
}
++runs[prev][runlength_minus1];
System.out.println(String.format("%d zeros, %d ones", count[0], count[1]));
for (int i = 0; i < maxrunlength; ++i)
{
System.out.println(String.format("%d runs of %d zeros, %d runs of %d ones", runs[0][i], i+1, runs[1][i], i+1));
}
}
public static void main(String[] args) {
Hasher[] hashers = {
new MultiplicativeHasher1(),
GF32Hasher.CRC32,
new QuickHasher(),
new CRC32TableHasher(),
new TrigHasher()
};
for (Hasher hasher : hashers)
{
test(hasher);
}
}
}
The simplest hash for int value is the int value.
See Java Integer class
public int hashCode()
public static int hashCode(int value)
Returns:
a hash code value for this object, equal to the primitive int value represented by this Integer object.

why is in place merge sort not stable?

The implementation below is stable as it used <= instead of < at line marked XXX. This also makes it more efficient. Is there any reason to use < and not <= at this line?
/**
class for In place MergeSort
**/
class MergeSortAlgorithm extends SortAlgorithm {
void sort(int a[], int lo0, int hi0) throws Exception {
int lo = lo0;
int hi = hi0;
pause(lo, hi);
if (lo >= hi) {
return;
}
int mid = (lo + hi) / 2;
/*
* Partition the list into two lists and sort them recursively
*/
sort(a, lo, mid);
sort(a, mid + 1, hi);
/*
* Merge the two sorted lists
*/
int end_lo = mid;
int start_hi = mid + 1;
while ((lo <= end_lo) && (start_hi <= hi)) {
pause(lo);
if (stopRequested) {
return;
}
if (a[lo] <= a[start_hi]) { // LINE XXX
lo++;
} else {
/*
* a[lo] >= a[start_hi]
* The next element comes from the second list,
* move the a[start_hi] element into the next
* position and shuffle all the other elements up.
*/
int T = a[start_hi];
for (int k = start_hi - 1; k >= lo; k--) {
a[k+1] = a[k];
pause(lo);
}
a[lo] = T;
lo++;
end_lo++;
start_hi++;
}
}
}
void sort(int a[]) throws Exception {
sort(a, 0, a.length-1);
}
}
Because the <= in your code assures that same-valued elements (in left- and right-half of sorting array) won't be exchanged.
And also, it avoids useless exchanges.
if (a[lo] <= a[start_hi]) {
/* The left value is smaller than or equal to the right one, leave them as is. */
/* Especially, if the values are same, they won't be exchanged. */
lo++;
} else {
/*
* If the value in right-half is greater than that in left-half,
* insert the right one into just before the left one, i.e., they're exchanged.
*/
...
}
Assume that same-valued element (e.g., ‘5’) in both-halves and the operator above is <.
As comments above shows, the right ‘5’ will be inserted before the left ‘5’, in other words, same-valued elements will be exchanged.
This means the sort is not stable.
And also, it's inefficient to exchange same-valued elements.
I guess the cause of inefficiency comes from the algorithm itself.
Your merging stage is implemented using insertion sort (as you know, it's O(n^2)).
You may have to re-implement when you sort huge arrays.
Fastest known in place stable sort:
http://thomas.baudel.name/Visualisation/VisuTri/inplacestablesort.html

Categories

Resources