We are trying to optimize heavy memory operations in Java and ran into some anomalies. From our data, we concluded the hypothesis, that an array/memory block might be loaded into CPU cache caused by a lot of accesses, but after cloning this array multiple times, the cache becomes full and moves the initial array back into RAM.
To test this, we set up a benchmark. It does the following:
Create an array with a given size
Write some data into the fields
Read/iterate it a million times (to push it into CPU cache)
Clone it once into a new array
Clone the new array into a new array and use the new one for the next time a given amount of times
Additionally, after each of these steps the array is iterated three times and the needed time is measured for each iteration. Here is the code:
private static long[] read(byte[] array, int count, boolean logTimes) {
long[] times = null;
if (logTimes) {
times = new long[count];
int sum = 0;
for (int n = 0; n < count; n++) {
long start = System.nanoTime();
for (int i = 0; i < array.length; i++) {
sum += array[i];
if (logTimes) {
long time = System.nanoTime() - start;
times[n] = time;
return times;
public static void main(String[] args) {
int arraySize = Integer.parseInt(args[0]);
int clones = Integer.parseInt(args[1]);
byte[] array = new byte[arraySize];
long[] initialReadTimes = read(array, 3, true);
// Fill with some non-zero content
for (int i = 0; i < array.length; i++) {
array[i] = (byte) i;
long[] afterWriteTimes = read(array, 3, true);
// Make this array important, so it lands in CPU Cache
read(array, 1_000_000, false);
long[] afterReadTimes = read(array, 3, true);
long[] afterFirstCloneReadTimes = null;
byte[] copy = new byte[array.length];
System.arraycopy(array, 0, copy, 0, array.length);
for (int i = 1; i <= clones; i++) {
byte[] copy2 = new byte[copy.length];
System.arraycopy(copy, 0, copy2, 0, copy.length);
copy = copy2;
if (i == 1) {
afterFirstCloneReadTimes = read(array, 3, true);
long[] afterAllClonesReadTimes = read(array, 3, true);
// Write to CSV
We ran this benchmark with arraysize=10,000 and clones=10,000,000 on a 2nd gen i5 with 16 GB RAM:
There was quite a lot of variation though, the 2nd and 3rd runs had different times sometimes or there were peaks in the 2nd and 3rd run of the last reading benchmark.
These results seem pretty confusing. I think that this could show that upon array initialization, it is not immediately loaded into CPU cache, because the initial read times are relatively high. After writing nothing seems to have changed. Only after iterating a lot the access times become faster, while the first run is always slower (because of the measuring overhead that runs between the readings?). Also cloning/filling memory with new arrays does not seem to have an impact at all. Could anyone explain these results?
We assumed that some of this might stem from java specific memory management, so we tried to reimplement the benchmark in C++:
void read(unsigned char array[], int length, int count, std::vector<long int> & logTimes) {
for (int c = 0; c < count; c++) {
int sum = 0;
std::chrono::high_resolution_clock::time_point t1;
if (count <= 3) {
t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < length; i++) {
sum += array[i];
if (count <= 3) {
std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
long int duration = std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
std::cout << duration << " ns\n";
int main(int argc, char ** args)
int ARRAYSIZE = 10000;
int CLONES = 10000000;
std::vector<long int> initialTimes, afterWritingTimes, afterReadTimes, afterFirstCloneTimes, afterCloneTimes, null;
unsigned char array[ARRAYSIZE];
read(array, ARRAYSIZE, 3, initialTimes);
for (long long i = 0; i < ARRAYSIZE; i++) {
array[i] = i;
std::cout << "Reads after writing:\n";
read(array, ARRAYSIZE, 3, afterWritingTimes);
read(array, ARRAYSIZE, 1000000, null);
std::cout << "Reads after 1M Reads:\n";
read(array, ARRAYSIZE, 3, afterReadTimes);
unsigned char copy[ARRAYSIZE];
unsigned char * ptr_copy = copy;
std::memcpy(ptr_copy, array, ARRAYSIZE);
for (long long i = 0; i < CLONES; i++) {
unsigned char copy2[ARRAYSIZE];
std::memcpy(copy2, ptr_copy, ARRAYSIZE);
ptr_copy = copy2;
if (i == 0) {
read(array, ARRAYSIZE, 3, afterFirstCloneTimes);
std::cout << "Reads after cloning:\n";
read(array, ARRAYSIZE, 3, afterCloneTimes);
writeTimesToCSV(initialTimes, afterWritingTimes, afterReadTimes, afterFirstCloneTimes, afterCloneTimes);
std::cout << "Finished.\n";
Using the same parameters, we got the following results:
So in C++ the times are rather similar to each other, with some strange peaks in the 2nd run. This seems to show that above faster timings were caused by java optimizations (or rather suboptimal handling in the first readings). Does this mean that the CPU cache is not involved at all?
Please refer to this problem from Hackerrank
HackerLand National Bank has a simple policy for warning clients about possible fraudulent account activity. If the amount spent by a client on a particular day is greater than or equal to the client's median spending for a trailing number of days, they send the client a notification about potential fraud. The bank doesn't send the client any notifications until they have at least that trailing number of prior days' transaction data.
I have written the following code. However, the code is working for some of the test cases and is getting 'terminated due to timeout' for some. Can anyone please tell how can I improve the code?
import java.io.*;
import java.math.*;
import java.security.*;
import java.text.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.regex.*;
public class Solution {
// Complete the activityNotifications function below.
static int activityNotifications(int[] expenditure, int d) {
//Delaring Variables
int iterations,itr,length,median,midDummy,midL,midR, midDummy2,i,i1,temp,count;
float mid,p,q;
length = expenditure.length;
iterations = length-d;
count = 0;
int[] exSub = new int[d];
// Enter the elements in the subarray
//Sort the exSub array
for(int k=0; k<(d-1); k++)
for(int j=k+1; j<d; j++)
temp = exSub[j];
exSub[j] = exSub[k];
exSub[k] = temp;
//Printing the exSub array in each iteration
for(int l = 0 ; l<d ; l++)
//For each iteration claculate the median
if(d%2 == 0) // even
midDummy = d/2;
p= (float)exSub[midDummy];
q= (float)exSub[midDummy-1];
mid = (p+q)/2;
//mid = (exSub[midDummy]+exSub [midDummy-1])/2;
else // odd
midDummy2 =d/2;
return count;
private static final Scanner scanner = new Scanner(System.in);
public static void main(String[] args) throws IOException {
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(System.getenv("OUTPUT_PATH")));
String[] nd = scanner.nextLine().split(" ");
int n = Integer.parseInt(nd[0]);
int d = Integer.parseInt(nd[1]);
int[] expenditure = new int[n];
String[] expenditureItems = scanner.nextLine().split(" ");
for (int i = 0; i < n; i++) {
int expenditureItem = Integer.parseInt(expenditureItems[i]);
expenditure[i] = expenditureItem;
int result = activityNotifications(expenditure, d);
The first rule on performance improvement is: Don't improve the performance if it's not needed.
Performance improvements usually lead to code that is less readable and therefore it should only be done when it's really needed.
The second rule is: Improve algorithms and data-structures before low-level improvements.
If you need to improve the performance of your code always try to use more efficient algorithms and data-structures before going to low-level improvement. In your code example that would be: Don't use BubbleSort, but try to use more efficient algorithms like Quicksort or Mergesort, because they use time complexity of O(n*log(n) while Bubble sort has a time complexity of O(n^2) which is much slower when you have to sort big arrays. You can use Arrays.sort(int[]) to do this.
Your data-structures are only arrays so this can't be improved in your code.
This will give your code quite some speedup, and will not lead to a code that can't be read anymore. Improvements like changing simple calculations to slightly faster calculations using bitshifts and other fast calculations (that are pretty hard to understand if used to often) will almost always lead to a code that is only slightly faster but no one will be able to easily understand it anymore.
Some improvements that could be applied to your code (that will also only slightly improve the performance) are:
Replace while loops with for loops if possible (they can be improved by the compiler)
Don't use System.out.println for many texts if it's not totaly needed (because it's quite slow for big texts)
Try to copy arrays using System.arraycopy which usually is faster than copying using while loops
So an improved code of yours could look like this (I marked the changed parts with comments):
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Arrays;
import java.util.Scanner;
public class Solution {
// Complete the activityNotifications function below.
static int activityNotifications(int[] expenditure, int d) {
//Delaring Variables
int iterations, itr, length, median, midDummy, midL, midR, midDummy2, i, i1, temp, count;
float mid, p, q;
length = expenditure.length;
iterations = length - d;
i = 0;
i1 = 0;
itr = 0;
count = 0;
int[] exSub = new int[d];
//EDIT: replace while loops with for loops if possible
//while (iterations > 0) {
for (int iter = 0; iter < iterations; iter++) {
//EDIT: here you can again use a for loop or just use System.arraycopy which should be (slightly) fasters
// Enter the elements in the subarray
/*while (i1 < d) {
exSub[i1] = expenditure[i + i1];
System.arraycopy(expenditure, i, exSub, 0, d);
//EDIT: Don't use bubble sort!!! It's one of the worst sorting algorithms, because it's really slow
//Bubble sort uses time complexity O(n^2); others (like merge-sort or quick-sort) only use O(n*log(n))
//The easiest and fastest solution is: don't implement sorting by yourself, but use Arrays.sort(int[]) from the java API
//Sort the exSub array
/*for (int k = 0; k < (d - 1); k++) {
for (int j = k + 1; j < d; j++) {
if (exSub[j] < exSub[k]) {
temp = exSub[j];
exSub[j] = exSub[k];
exSub[k] = temp;
//Printing the exSub array in each iteration
//EDIT: printing many results also takes much time, so only print the results if it's really needed
/*for (int l = 0; l < d; l++) {
i1 = 0;
//For each iteration claculate the median
if (d % 2 == 0) // even
midDummy = d / 2;
p = (float) exSub[midDummy];
q = (float) exSub[midDummy - 1];
mid = (p + q) / 2;
//mid = (exSub[midDummy]+exSub [midDummy-1])/2;
else // odd
midDummy2 = d / 2;
mid = exSub[midDummy2];
if (expenditure[itr + d] >= 2 * mid) {
//iterations--;//EDIT: don't change iterations anymore because of the for loop
System.out.println("Mid:" + mid);
System.out.println("Count:" + count);
return count;
private static final Scanner scanner = new Scanner(System.in);
public static void main(String[] args) throws IOException {
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(System.getenv("OUTPUT_PATH")));
String[] nd = scanner.nextLine().split(" ");
int n = Integer.parseInt(nd[0]);
int d = Integer.parseInt(nd[1]);
int[] expenditure = new int[n];
String[] expenditureItems = scanner.nextLine().split(" ");
for (int i = 0; i < n; i++) {
int expenditureItem = Integer.parseInt(expenditureItems[i]);
expenditure[i] = expenditureItem;
int result = activityNotifications(expenditure, d);
You can make the solution even faster if you don't sort the complete (sub-)array in every iteration, but instead only remove one value (the first day that is not used anymore) and add a new value (the new day that is now used) in the correct position (like #Vojtěch Kaiser mentioned in his answer)
This will make it even faster, because sorting an array takes the time O(d*log(d)), while adding a new value into an array, that is already sorted only takes the time O(log(d)) if you are using a search tree. When using an array (like I did in the example below) it takes the time O(d) because when using an array you need to copy the array values which takes linear time (like #dyukha mentioned in the comments). So the improvement (again) can be done by using a better algorithm (This solution could also be improved by using a search tree instead of an array).
So the new solution could look like this:
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Arrays;
import java.util.Scanner;
public class Solution {
// Complete the activityNotifications function below.
static int activityNotifications(int[] expenditure, int d) {
//Delaring Variables
int iterations, length, midDummy, midDummy2, count;//EDIT: removed some unused variables here
float mid, p, q;
length = expenditure.length;
iterations = length - d;
count = 0;
//EDIT: add the first d values to the sub-array and sort it (only once)
int[] exSub = new int[d];
System.arraycopy(expenditure, 0, exSub, 0, d);
for (int iter = 0; iter < iterations; iter++) {
//EDIT: don't sort the complete array in every iteration
//instead remove the one value (the first day that is not used anymore) and add the new value (of the new day) into the sorted array
//sorting is done in O(n * log(n)); deleting and inserting a new value into a sorted array is done in O(log(n))
if (iter > 0) {//not for the first iteration
int remove = expenditure[iter - 1];
int indexToRemove = find(exSub, remove);
//remove the index and move the following values one index to the left
exSub[indexToRemove] = 0;//not needed; just to make it more clear what's happening
System.arraycopy(exSub, indexToRemove + 1, exSub, indexToRemove, exSub.length - indexToRemove - 1);
exSub[d - 1] = 0;//not needed again; just to make it more clear what's happening
int newValue = expenditure[iter + d - 1];
//insert the new value to the correct position
insertIntoSortedArray(exSub, newValue);
//For each iteration claculate the median
if (d % 2 == 0) // even
midDummy = d / 2;
p = exSub[midDummy];
q = exSub[midDummy - 1];
mid = (p + q) / 2;
//mid = (exSub[midDummy]+exSub [midDummy-1])/2;
else // odd
midDummy2 = d / 2;
mid = exSub[midDummy2];
if (expenditure[iter + d] >= 2 * mid) {
System.out.println("Count:" + count);
return count;
* Find the position of value in expenditure
private static int find(int[] array, int value) {
int index = -1;
for (int i = 0; i < array.length; i++) {
if (array[i] == value) {
index = i;
return index;
* Find the correct position to insert value into the array by bisection search
private static void insertIntoSortedArray(int[] array, int value) {
int[] indexRange = new int[] {0, array.length - 1};
while (indexRange[1] - indexRange[0] > 0) {
int mid = indexRange[0] + (indexRange[1] - indexRange[0]) / 2;
if (value > array[mid]) {
if (mid == indexRange[0]) {
indexRange[0] = mid + 1;
else {
indexRange[0] = mid;
else {
if (mid == indexRange[1]) {
indexRange[1] = mid - 1;
else {
indexRange[1] = mid;
System.arraycopy(array, indexRange[0], array, indexRange[0] + 1, array.length - indexRange[0] - 1);
array[indexRange[0]] = value;
private static final Scanner scanner = new Scanner(System.in);
public static void main(String[] args) throws IOException {
BufferedWriter bufferedWriter = new BufferedWriter(new FileWriter(System.getenv("OUTPUT_PATH")));
String[] nd = scanner.nextLine().split(" ");
int n = Integer.parseInt(nd[0]);
int d = Integer.parseInt(nd[1]);
int[] expenditure = new int[n];
String[] expenditureItems = scanner.nextLine().split(" ");
for (int i = 0; i < n; i++) {
int expenditureItem = Integer.parseInt(expenditureItems[i]);
expenditure[i] = expenditureItem;
int result = activityNotifications(expenditure, d);
//Just for testing; can be deleted if you don't need it
/*int[] exp = new int[] {2, 3, 4, 2, 3, 6, 8, 4, 5};
int d = 5;
activityNotifications(exp, d);
int[] exp2 = new int[] {1, 2, 3, 4, 4};
d = 4;
activityNotifications(exp2, d);*/
Your main concern is that you are sorting the partial array in every iteration, costing you total complexity of the problem O(n d log(d)), which can get pretty hairy for large d values.
What you want is to keep the array sorted between iterations and sort in/out changed values. For that you would implement binary search tree (BST) or some other balanced option (AVL, ...), perform O(log(d)) removal of oldest value, then perform O(log(d)) insertion of new value, and simply look in the middle for median. Total asymptotic complexity would be O(n log(d)) which is as far as I know the best you can get - rest of the optimization is low level dirty work.
Take a look at java https://docs.oracle.com/javase/10/docs/api/java/util/TreeSet.html, which should take care of the most of the work, but keep in mind that underlying structure is made out of objects that will be slower than arrays.
I am solving problem in codeforces. I program on Java.
In this problem I create array dp[N][5][3] of ints (there are about N*5*3 recursive calls). When N is equal to a million, my program falls in memory, however memory limit is about 256 MB. The same solution in C++ goes well, with eating twice less memory. Why is it so?
Here is the code:
private int MAX = (int) 1E6 + 6;
private int[] cnt;
private int[][][] dp;
private int min(int a, int b) {
return a > b ? b : a;
private int max(int a, int b) {
return a < b ? b : a;
private int solve(int x, int t1, int t2) {
// element x, x used t1 times, x + 1 used t2 times
if (dp[x][t1][t2] != -1)
return dp[x][t1][t2];
else if (x + 3 > MAX)
return dp[x][t1][t2] = (cnt[x] - t1) / 3 + (cnt[x + 1] - t2) / 3;
int ans0, ans1 = 0, ans2 = 0;
ans0 = (cnt[x] - t1) / 3 + solve(x + 1, t2, 0);
int min = min(cnt[x] - t1, min(cnt[x + 1] - t2, cnt[x + 2]));
if (min >= 1)
ans1 = (cnt[x] - t1 - 1) / 3 + 1 + solve(x + 1, t2 + 1, 1);
if (min >= 2)
ans2 = (cnt[x] - t1 - 2) / 3 + 2 + solve(x + 1, t2 + 2, 2);
return dp[x][t1][t2] = max(ans0, max(ans1, ans2));
private void solve(InputReader in, PrintWriter out) {
int n = in.nextInt();
int m = in.nextInt();
cnt = new int[MAX];
for (int i = 0; i < n; i++) cnt[in.nextInt()]++;
dp = new int[MAX][5][3];
for (int i = 0; i <= m; i++)
for (int j = 0; j < 5; j++)
for (int k = 0; k < 3; k++)
dp[i][j][k] = -1;
out.println(solve(1, 0, 0));
There is no need for understanding the logic of solve function. Here the recursive method is simply called about N*5*3 times.
If you absolutely need such a large Array (and I recommand you to think twice before using that large array and look for better memory solution) you can increase the max memory size of a java programe at launch time by using the argument -Xmx
Eg : java -jar myProgram.jar -Xmx1G
to use 1GB as max heap size memory (note that if its not enough you can increase more)
Note that you can also specify the initial heap size withthe argument -Xms
It depends on your code. First of all if you work on recursive method jvm stores each object in heap and holds the reference for each object in the stack. If you want to prevent this error you should initialize unused object to null.
I gave a shot at solving the Hackerland Radio Transmitters programming challange.
To summarize, challenge goes as follows:
Hackerland is a one-dimensional city with n houses, where each house i is located at some xi on the x-axis. The Mayor wants to install radio transmitters on the roofs of the city's houses. Each transmitter has a range, k, meaning it can transmit a signal to all houses ≤ k units of distance away.
Given a map of Hackerland and the value of k, can you find the minimum number of transmitters needed to cover every house?
My implementation is as follows:
package biz.tugay;
import java.util.*;
public class HackerlandRadioTransmitters {
public static int minNumOfTransmitters(int[] houseLocations, int transmitterRange) {
// Sort and remove duplicates..
houseLocations = uniqueHouseLocationsSorted(houseLocations);
int towerCount = 0;
for (int nextHouseNotCovered = 0; nextHouseNotCovered < houseLocations.length; ) {
final int towerLocation = HackerlandRadioTransmitters.findNextTowerIndex(houseLocations, nextHouseNotCovered, transmitterRange);
nextHouseNotCovered = HackerlandRadioTransmitters.nextHouseNotCoveredIndex(houseLocations, towerLocation, transmitterRange);
if (nextHouseNotCovered == -1) {
return towerCount;
public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
int towerIndex = houseNotCoveredIndex;
int loop = 0;
while (true) {
if (towerIndex == houseLocations.length - 1) {
if (farthestHouseLocationAllowed >= houseLocations[towerIndex + 1]) {
System.out.println("findNextTowerIndex looped : " + loop);
return towerIndex;
public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
int notCoveredHouseIndex = towerIndex + 1;
int loop = 0;
while (notCoveredHouseIndex < houseLocations.length) {
final int locationOfHouseBeingChecked = houseLocations[notCoveredHouseIndex];
if (locationOfHouseBeingChecked > towerCoversUntil) {
break; // Tower does not cover the house anymore, break the loop..
if (notCoveredHouseIndex == houseLocations.length) {
notCoveredHouseIndex = -1;
System.out.println("nextHouseNotCoveredIndex looped : " + loop);
return notCoveredHouseIndex;
public static int[] uniqueHouseLocationsSorted(final int[] houseLocations) {
final HashSet<Integer> integers = new HashSet<>();
final int[] houseLocationsUnique = new int[houseLocations.length];
int innerCounter = 0;
for (int houseLocation : houseLocations) {
if (integers.contains(houseLocation)) {
houseLocationsUnique[innerCounter] = houseLocation;
return Arrays.copyOf(houseLocationsUnique, innerCounter);
I am pretty sure this implementation is correct. But please see the detail in the functions: findNextTowerIndex and nextHouseNotCoveredIndex: they walk the array one by one!
One of my tests is as follows:
static void test_01() throws FileNotFoundException {
final long start = System.currentTimeMillis();
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381);
assert minNumOfTransmitters == 1;
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
where input.txt can be downloaded from here. (It is not the most important detail in this question, but still..) So we have an array of 73382 houses, and I deliberately set the transmitter range so the methods I have loop a lot:
Here is a sample output from this test in my machine:
findNextTowerIndex looped : 38213
nextHouseNotCoveredIndex looped : 13785
Took: 359 milliseconds..
I also have this test, which does not assert anything, but just keeps time:
static void test_02() throws FileNotFoundException {
final long start = System.currentTimeMillis();
for (int i = 0; i < 400; i ++) {
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
final int transmitterRange = ThreadLocalRandom.current().nextInt(1, 70000);
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
where I randomly create 400 transmitter ranges, and run the program 400 times.. I will get run times as follows in my machine..
Took: 20149 milliseconds..
So now, I said, why don 't I use binary search instead of walking the array and changed my implementations as follows:
public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
int nextTowerIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, farthestHouseLocationAllowed);
if (nextTowerIndex < 0) {
nextTowerIndex = -nextTowerIndex;
nextTowerIndex = nextTowerIndex -2;
return nextTowerIndex;
public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
int nextHouseNotCoveredIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, towerCoversUntil);
if (-nextHouseNotCoveredIndex > houseLocations.length) {
return -1;
if (nextHouseNotCoveredIndex < 0) {
nextHouseNotCoveredIndex = - (nextHouseNotCoveredIndex + 1);
return nextHouseNotCoveredIndex;
return nextHouseNotCoveredIndex + 1;
and I am expecting a great performance boost, as now I will at most loop for log(N) times, instead of O(N).. So test_01 outputs:
Took: 297 milliseconds..
Remember, it was Took: 359 milliseconds.. before. And for test_02:
Took: 18047 milliseconds..
So I always get values around 20 seconds with array walking implementation and 18 - 19 seconds for binary search implementation.
I was expecting a much better performance gain using Arrays.binarySearch but obviously it is not the case, why is this? What am I missing? Do I need an array with more than 73382 to see the benefit, or is it irrelevant?
Edit #01
After #huck_cussler 's comment, I tried doubling and tripling the dataset I have (with random numbers) and tried running test02 (of course with tripling the array sizes in the test itself..). For the linear implementation the times go like this:
Took: 18789 milliseconds..
Took: 34396 milliseconds..
Took: 53504 milliseconds..
For the binary search implementation, I got values as follows:
Took: 18644 milliseconds..
Took: 33831 milliseconds..
Took: 52886 milliseconds..
Your timing includes the retrieval of data from your hard drive. This could be taking the majority of your runtime. Omit the data load from your timing to get a more accurate comparison of your two approaches. Imagine if it takes up 18 seconds and you're comparing 18.644 vs 18.789 (0.77% improvement) instead of 0.644 vs 0.789 (18.38% improvement).
If you have a linear operation O(n), such as loading a binary structure, and you combine it with a binary search O(log n), you end up with O(n). If you trust Big O notation, then you should expect O(n + log n) to not be significantly different from O(2 * n) as they both reduce to O(n).
Also, a binary search may perform better or worse than a linear search depending on the density of houses between towers. Consider, say 1024 homes with a tower evenly dispersed every 4 homes. A linear search will step 4 times per tower, while a binary search will take log2(1024)=10 steps per tower.
One more thing... your minNumOfTransmitters method is sorting the already-sorted array passed into it from test_01 and test_02. That resorting step takes longer than your searches themselves, which further obscures the timing differences between your two search algorithms.
I created a small timing class to give a better picture of what's happening. I've removed the line of code from minNumOfTransmitters to prevent it from rerunning the sort, and added a boolean param to select whether to use your binary version. It totals the sum of times for 400 iterations, separating out each step. The results on my system illustrate that the load time dwarfs the sort time, which in turn dwarfs the solve time.
Load: 22.565s
Sort: 4.518s
Linear: 0.012s
Binary: 0.003s
It's easy to see how optimizing that last step doesn't make much difference in overall runtime.
private static class Timing {
public long load=0;
public long sort=0;
public long solve1=0;
public long solve2=0;
private String secs(long millis) {
return String.format("%3d.%03ds", millis/1000, millis%1000);
public String toString() {
return " Load: " + secs(load) + "\n Sort: " + secs(sort) + "\nLinear: " + secs(solve1) + "\nBinary: " + secs(solve2);
public void add(Timing timing) {
static Timing test_01() throws FileNotFoundException {
Timing timing=new Timing();
long start = System.currentTimeMillis();
final File file = new File("c:\\path\\to\\xnpwdiG3.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381, false);
final int minNumOfTransmittersBin = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381, true);
final long end = System.currentTimeMillis();
return timing;
In your time measurement you include operations that are much slower than array search. Namely filesystem I/O and array sorting.
I/O in general (reading/writing from filesystem, network communication) is by orders of magnitude slower than operations that involve only CPU and RAM access.
Let's rewrite your test in a way that does not read the file on every loop iteration:
static void test_02() throws FileNotFoundException {
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
final int rounds = 400;
final int[] uniqueHouseLocationsSorted = uniqueHouseLocationsSorted(houseLocations);
final int transmitterRange = 73381;
final long start = System.currentTimeMillis();
for (int i = 0; i < rounds; i++) {
final int minNumOfTransmitters = minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
Notice in this version of the test the file is read only once and time measuring starts after that.
With the above, I get Took: 1700 milliseconds.. (more or less a few millis) for both the iterative version and the binary search. So we still can't see that binary search is faster. That's because almost all of that time goes into sorting the array 400 times.
Now let's remove the line that sorts the input array from the minNumOfTransmitters method. We sort the array (once) anyway at the beginning of the test.
Now we can see that things are much faster. After removing the line houseLocations = uniqueHouseLocationsSorted(houseLocations) from minNumOfTransmitters I get: Took: 68 milliseconds.. for the iterative version. Clearly, since this duration is already very small, we will not see a significant difference with the binary search version.
So let's increase the number of loop rounds to: 100000.
Now I get Took: 2121 milliseconds.. for the iterative version and Took: 36 milliseconds.. for the binary search version.
Because we now isolated what we measure and focus on the array searches, rather than including operations that are much slower, we can notice the big difference in performance (for the better) of binary search.
If you want to see how many times binary search enters its while loop, you can implement it yourself and add a counter:
private static int binarySearch0(int[] a, int fromIndex, int toIndex, int key) {
int low = fromIndex;
int high = toIndex - 1;
int loop = 0;
while (low <= high) {
int mid = (low + high) >>> 1;
int midVal = a[mid];
if (midVal < key) {
low = mid + 1;
} else if (midVal > key) {
high = mid - 1;
} else {
return mid; // key found
System.out.println("binary search looped " + loop + " times");
return -(low + 1); // key not found.
The method is copied from the Arrays class in the JDK - I just added the loop counter and the println.
When the length of the array to search is 73382, the loop enters only 16 times.
That is exactly what we expect: log(73382) =~ 16.
I agree with other answers that the main issue with your tests is that they measure wrong things: IO and sorting. But I don't think suggested tests are good. My suggestion is following:
static void test_02() throws FileNotFoundException {
final File file = new File("43620487.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
final int[] uniqueHouseLocationsSorted = uniqueHouseLocationsSorted(houseLocations);
final Random random = new Random(0); // fixed seed to have the same sequences in all tests
long sum = 0;
// warm up
for (int i = 0; i < 100; i++) {
final int transmitterRange = random.nextInt(70000) + 1;
final int minNumOfTransmitters = minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
sum += minNumOfTransmitters;
// actual measure
final long start = System.currentTimeMillis();
for (int i = 0; i < 4000; i++) {
final int transmitterRange = random.nextInt(70000) + 1;
final int minNumOfTransmitters = minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
sum += minNumOfTransmitters;
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds. Sum = " + sum);
Note also that I remove all System.out.println calls from findNextTowerIndex and nextHouseNotCoveredIndex and uniqueHouseLocationsSorted call from minNumOfTransmitters as they affect performance testing as well.
So what I think is important here:
Move all I/O and sorting out of the measurement loop
Perform some warm up outside of measurement
Use the same random sequence for all measurements
Don't dispose result of the calculation so JIT can't optimize that call out altogether
With such test I see about 10 times difference on my machine: around 80ms vs around 8ms.
And if you really want to do performance tests in Java you should consider using JMH aka Java Microbenchmark Harness
Agree with other answers, the IO time is most problem, and sort is second, the search is last time consumer.
And agree phatfingers's example, the binary search sometime is worst than linear search in your problem because totally linear search goes one loop for every element(n times compare) but binary search run for tower times (O(logn)*#tower)), one suggestion is that binary search not start from 0, but from current location
int nextTowerIndex = Arrays.binarySearch(houseLocations, houseNotCoveredIndex+1, houseLocations.length, arthestHouseLocationAllowed)
then it should O(logn)*#tower/2)
Even more, maybe you can calculate every tower cover how many houses avg then first compare avg houses then using binary search start from houseNotCoveredIndex + avg + 1, but not sure the performance is much better.
ps: sort and unique you can using TreeSet as
public static int[] uniqueHouseLocationsSorted(final int[] houseLocations) {
final Set<Integer> integers = new TreeSet<>();
for (int houseLocation : houseLocations) {
int[] unique = new int[integers.size()];
int i = 0;
for(Integer loc : integers){
unique[i] = loc;
return unique;
uniqueHouseLocationsSorted is not efficient, andy solution seems better, but I think this could improve the time spent (note that I did not test the code):
public static int[] uniqueHouseLocationsSorted(final int[] houseLocations) {
int size = houseLocations.length;
if (size == 0) return null; // you have to check for null later or maybe throw an exception here
final int[] houseLocationsUnique = new int[size];
int previous = houseLocationsUnique[0] = houseLocations[0];
int innerCounter = 1;
for (int i = 1; i < size; i++) {
int houseLocation = houseLocations[i];
if (houseLocation == previous) continue; // since elements are sorted this is faster
previous = houseLocationsUnique[innerCounter++] = houseLocation;
return Arrays.copyOf(houseLocationsUnique, innerCounter);
Consider also using an Array list as copying the array takes time.
i want to implement an Entropy function in parallel with APARAPI.
in that function i need to count different keys in a vector but it cant execute correctly.
assume that we have just 3 different values.
here is my codes:
final int[] V = new int[1024];
// Initialization for V values
final int[] count = new int[3];
Kernel kernel = new Kernel(){
public void run(){
int gid = getGlobalId();
after run this code segment, when i print count[] values it gives me 1,1,1.
it seems that count[V[gid]]++ execute just 1 time for each V[gid].
So here is the problem. The ++ operator is actually three operations in one: read the current value, increment it, write the new value. In Aparapi you have potentially 1024 GPU threads running simultaneously. That means they will read the value, probably all the same time when the value is 0, then increment it to 1, then all 1024 threads will write 1. So it is acting as expected.
What you are trying to do is called a Map-reduce function. You are just skipping a lot of steps. You need to remember Aparapi is a system that has no Thread safety, so you have to write your algorithms to accommodate that. That is where Map-reduce comes in and here is how to do one. I just wrote it and added it to the Aparapi repository at its new home, details below.
int size = 1024;
final int count = 3;
final int[] V = new int[size];
//lets fill in V randomly...
for (int i = 0; i < size; i++) {
//random number either 0, 1, or 2
V[i] = (int) (Math.random() * 3);
//this will hold our values between the phases.
int[][] totals = new int[count][size];
final int[][] kernelTotals = totals;
Kernel mapKernel = new Kernel() {
public void run() {
int gid = getGlobalId();
int value = V[gid];
for(int index = 0; index < count; index++) {
if (value == index)
kernelTotals[index][gid] = 1;
totals = kernelTotals;
while (size > 1) {
int nextSize = size / 2;
final int[][] currentTotals = totals;
final int[][] nextTotals = new int[count][nextSize];
Kernel reduceKernel = new Kernel() {
public void run() {
int gid = getGlobalId();
for(int index = 0; index < count; index++) {
nextTotals[index][gid] = currentTotals[index][gid * 2] + currentTotals[index][gid * 2 + 1];
totals = nextTotals;
size = nextSize;
assert size == 1;
// Done, just print it out //
int[] results = new int[3];
results[0] = totals[0][0];
results[1] = totals[1][0];
results[2] = totals[2][0];
Keep in mind while it may seem inefficient it actually works pretty well on much larger number. This algorithm works just fine with
size = 1048576.
With the new size the following result was computed on my system in about a second.
[349602, 349698, 349276]
One final note, you might want to consider moving to the more active project at aparapi.com. It includes several fixes to bugs and a lot of extra features and performance enhancements over the older library you linked above. It is also in maven central with about a dozen releases. so it is easier to use. I just wrote the code in this answer but decided to use it in the new Aparapi repository's example section, you can find that at the following link in the new Aparapi repository.
I have an int and float array each of length 220 million (fixed). Now, I want to store/upload those arrays to/from memory and disk. Currently, I am using Java NIO's FileChannel and MappedByteBuffer to solve this. It works fine, but it takes near about 5 seconds (Wall Clock Time) for storing/uploading array to/from memory to disk. Now, I want to make it faster.
Here, I should mention most of those array elements are 0 ( nearly 52 %).
int arr1 [] = { 0 , 0 , 6 , 7 , 1, 0 , 0 ...}
Can anybody help me, is there any nice way to improve speed by not storing or loading those 0's. This can compensated by using Arrays.fill (array , 0).
The following approach requires n / 8 + nz * 4 bytes on disk, where n is the size of the array, and nz the number of non-zero entries. For 52% zero entries, you'd reduce storage size by 52% - 3% = 49%.
You could do:
void write(int[] array) {
BitSet zeroes = new BitSet();
for (int i = 0; i < array.length; i++)
zeroes.set(i, array[i] == 0);
write(zeroes); // one bit per index
for (int i = 0; i < array.length; i++)
if (array[i] != 0)
int[] read() {
BitSet zeroes = readBitSet();
array = new int[zeroes.length];
for (int i = 0; i < zeroes.length; i++) {
if (zeroes.get(i)) {
// nothing to do (array[i] was initialized to 0)
} else {
array[i] = readInt();
Edit: That you say this is slightly slower implies that the disk is not the bottleneck. You could tune the above approach by writing the bitset as you construct it, so you don't have to write the bitset to memory before writing it to disk. Also, by writing the bitset word by word interspersed with the actual data we can do only a single pass over the array, reducing cache misses:
void write(int[] array) {
int ni;
for (int i = 0; i < array.length; i = ni) {
ni = i + 32;
int zeroesMap = 0;
for (j = i + 31; j >= i; j--) {
zeroesMap <<= 1;
if (array[j] == 0) {
zeroesMap |= 1;
for (j = i; j < ni; j++)
if (array[j] != 0) {
int[] read() {
int[] array = new int[readInt()];
int ni;
for (int i = 0; i < array.length; i = ni) {
ni = i + 32;
zeroesMap = readInt();
for (j = i; j < ni; j++) {
if (zeroesMap & 1 == 1) {
// nothing to do (array[i] was initialized to 0)
} else {
array[j] = readInt();
zeroesMap >>= 1;
return array;
(The preceeding code assumes array.length is a multiple of 32. If not, write the last slice of the array in whatever way you like)
If that doesn't reduce proceccing time either, compression is not the way to go (I don't think any general purpose compression algorithm will be faster than the above).
Depending upon the distribution, consider Run-length Encoding:
Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. This is most useful on data that contains many such runs.
It is simple ... which is good, and possibly bad, here ;-)
In case you are willing to write the serialization-desirialization code yourself, instead of storing all the zeroes you can store a series of ranges that indicate where those zeros are(with a special marker), together with the actual non-zero data.
So the array in your example: { 0 , 0 , 6 , 7 , 1, 0 , 0 ...}
can be stored as:
%0-1, 6, 7, 1 %5-6
when reading this data, if you hit % it means you have a range in from of you, you read the start and the end and fill an zeroes. Then you go on and see a non #, this means you hit an actual value.
In a sparse array that has large sequences of consecutive values this will yield great compression.
There is a standard compression utils in java: java.util.zip - it's general purpose library but due to sheer availability is an ok solution. Specialized compressions, encoding should be researched, if need arises and I rarely recommend zip as the soultion of choise.
Here is a sample how to handle zip via Deflater/Inflater.
Most people know ZipInput/Output Stream (and esp. Gzip). All of them have downsdes in handling the copy from mem->zlib and esp. GZip which is a total disaster as having CRC32 calling the native code (calling native code removes the ability to optimize and introduces some more performance hits).
Few important notes: do not boost zip compression high, that will kill any performance whatsoever - of course one can experiment and fit their best ratio between CPU and disk activity.
The code also demonstrates one of the real shortcomings of java.util.zip - it doesn't support direct buffers. The support is beyond trivial, yet no one bother to do it. Direct buffers will save few memory copies and reduces the memory footprint.
Last note: there is java version of (j)zlib and it beats the native impl. on compression quite nicely.
package t1;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Random;
import java.util.zip.DataFormatException;
import java.util.zip.Deflater;
import java.util.zip.Inflater;
public class ZInt {
private static final int bucketSize = 1<<17;//in real world should not be const, but we bored horribly
static final int zipLevel = 2;//feel free to experiement, higher compression (5+)is likely to be total waste
static void write(int[] a, File file, boolean sync) throws IOException{
byte[] bucket = new byte[Math.min(bucketSize, Math.max(1<<13, Integer.highestOneBit(a.length >>3)))];//128KB bucket
byte[] zipOut = new byte[bucket.length];
final FileOutputStream fout = new FileOutputStream(file);
FileChannel channel = fout.getChannel();
ByteBuffer buf = ByteBuffer.wrap(bucket);
//unfortunately java.util.zip doesn't support Direct Buffer - that would be the perfect fit
ByteBuffer out = ByteBuffer.wrap(zipOut);
out.putInt(a.length);//write length aka header
if (a.length==0){
doWrite(channel, out, 0);
Deflater deflater = new Deflater(zipLevel, false);
for (int i=0;i<a.length;){
i = put(a, buf, i);
deflater.setInput(bucket, buf.position(), buf.limit());
if (i==a.length)
//hacking and using bucket here is tempting since it's copied twice but well
for (int n; (n= deflater.deflate(zipOut, out.position(), out.remaining()))>0;){
doWrite(channel, out, n);
if (sync)
static int[] read(File file) throws IOException, DataFormatException{
FileChannel channel = new FileInputStream(file).getChannel();
byte[] in = new byte[(int)Math.min(bucketSize, channel.size())];
ByteBuffer buf = ByteBuffer.wrap(in);
int[] a = new int[buf.getInt()];
if (a.length==0)
return a;
int i=0;
byte[] inflated = new byte[Math.min(1<<17, a.length*4)];
ByteBuffer intBuffer = ByteBuffer.wrap(inflated);
Inflater inflater = new Inflater(false);
if (!buf.hasRemaining()){
inflater.setInput(in, buf.position(), buf.remaining());
buf.position(buf.position()+buf.remaining());//simulate all read
for (;;){
int n = inflater.inflate(inflated,intBuffer.position(), intBuffer.remaining());
if (n==0)
for (;intBuffer.remaining()>3 && i<a.length;i++){//need at least 4 bytes to form an int
a[i] = intBuffer.getInt();
}while (channel.position()<channel.size() && i<a.length);
// System.out.printf("read ints: %d - channel.position:%d %n", i, channel.position());
return a;
private static void doWrite(FileChannel channel, ByteBuffer out, int n) throws IOException {
while (out.hasRemaining())
private static int put(int[] a, ByteBuffer buf, int i) {
for (;buf.hasRemaining() && i<a.length;){
return i;
private static int[] generateRandom(int len){
Random r = new Random(17);
int[] n = new int[len];
for (int i=0;i<len;i++){
n[i]= r.nextBoolean()?0: r.nextInt(1<<23);//limit bounds to have any sensible compression
return n;
public static void main(String[] args) throws Throwable{
File file = new File("xxx.xxx");
int[] n = generateRandom(3000000); //{0,2,4,1,2,3};
long start = System.nanoTime();
write(n, file, false);
long elapsed = System.nanoTime() - start;//elapsed will be fairer if the sync is true
System.out.printf("File length: %d, for %d ints, ratio %.2f in %.2fms %n", file.length(), n.length, ((double)file.length())/4/n.length, java.math.BigDecimal.valueOf(elapsed, 6) );
int[] m = read(file);
//compare, Arrays.equals doesn't return position, so it sucks/kinda
for (int i=0; i<n.length; i++){
if (m[i]!=n[i]){
System.err.printf("Failed at %d%n",i);
System.out.printf("All done!");
Please note, the code is not a proper benchmark!
The delayed replies comes from the fact it was quite boring to code, yet another zip example, sorry