I am solving problem in codeforces. I program on Java.
In this problem I create array dp[N][5][3] of ints (there are about N*5*3 recursive calls). When N is equal to a million, my program falls in memory, however memory limit is about 256 MB. The same solution in C++ goes well, with eating twice less memory. Why is it so?
Here is the code:
private int MAX = (int) 1E6 + 6;
private int[] cnt;
private int[][][] dp;
private int min(int a, int b) {
return a > b ? b : a;
}
private int max(int a, int b) {
return a < b ? b : a;
}
private int solve(int x, int t1, int t2) {
// element x, x used t1 times, x + 1 used t2 times
if (dp[x][t1][t2] != -1)
return dp[x][t1][t2];
else if (x + 3 > MAX)
return dp[x][t1][t2] = (cnt[x] - t1) / 3 + (cnt[x + 1] - t2) / 3;
int ans0, ans1 = 0, ans2 = 0;
ans0 = (cnt[x] - t1) / 3 + solve(x + 1, t2, 0);
int min = min(cnt[x] - t1, min(cnt[x + 1] - t2, cnt[x + 2]));
if (min >= 1)
ans1 = (cnt[x] - t1 - 1) / 3 + 1 + solve(x + 1, t2 + 1, 1);
if (min >= 2)
ans2 = (cnt[x] - t1 - 2) / 3 + 2 + solve(x + 1, t2 + 2, 2);
return dp[x][t1][t2] = max(ans0, max(ans1, ans2));
}
private void solve(InputReader in, PrintWriter out) {
int n = in.nextInt();
int m = in.nextInt();
cnt = new int[MAX];
for (int i = 0; i < n; i++) cnt[in.nextInt()]++;
dp = new int[MAX][5][3];
for (int i = 0; i <= m; i++)
for (int j = 0; j < 5; j++)
for (int k = 0; k < 3; k++)
dp[i][j][k] = -1;
out.println(solve(1, 0, 0));
}
There is no need for understanding the logic of solve function. Here the recursive method is simply called about N*5*3 times.
If you absolutely need such a large Array (and I recommand you to think twice before using that large array and look for better memory solution) you can increase the max memory size of a java programe at launch time by using the argument -Xmx
Eg : java -jar myProgram.jar -Xmx1G
to use 1GB as max heap size memory (note that if its not enough you can increase more)
Note that you can also specify the initial heap size withthe argument -Xms
It depends on your code. First of all if you work on recursive method jvm stores each object in heap and holds the reference for each object in the stack. If you want to prevent this error you should initialize unused object to null.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I need help with solving the second level of Google's Foobar challenge.
Commander Lambda uses an automated algorithm to assign minions randomly to tasks, in order to keep her minions on their toes. But you've noticed a flaw in the algorithm - it eventually loops back on itself, so that instead of assigning new minions as it iterates, it gets stuck in a cycle of values so that the same minions end up doing the same tasks over and over again. You think proving this to Commander Lambda will help you make a case for your next promotion.
You have worked out that the algorithm has the following process:
1) Start with a random minion ID n, which is a nonnegative integer of length k in base b
2) Define x and y as integers of length k. x has the digits of n in descending order, and y has the digits of n in ascending order
3) Define z = x - y. Add leading zeros to z to maintain length k if necessary
4) Assign n = z to get the next minion ID, and go back to step 2
For example, given minion ID n = 1211, k = 4, b = 10, then x = 2111, y = 1112 and z = 2111 - 1112 = 0999. Then the next minion ID will be n = 0999 and the algorithm iterates again: x = 9990, y = 0999 and z = 9990 - 0999 = 8991, and so on.
Depending on the values of n, k (derived from n), and b, at some point the algorithm reaches a cycle, such as by reaching a constant value. For example, starting with n = 210022, k = 6, b = 3, the algorithm will reach the cycle of values [210111, 122221, 102212] and it will stay in this cycle no matter how many times it continues iterating. Starting with n = 1211, the routine will reach the integer 6174, and since 7641 - 1467 is 6174, it will stay as that value no matter how many times it iterates.
Given a minion ID as a string n representing a nonnegative integer of length k in base b, where 2 <= k <= 9 and 2 <= b <= 10, write a function solution(n, b) which returns the length of the ending cycle of the algorithm above starting with n. For instance, in the example above, solution(210022, 3) would return 3, since iterating on 102212 would return to 210111 when done in base 3. If the algorithm reaches a constant, such as 0, then the length is 1.
Test Cases: Solution.solution("1211", 10) returns 1
Solution.solution("210022", 3) returns 3
Here is my code:
import java.util.ArrayList;
import java.util.Arrays;
public class Solution {
public static int solution(String n, int b) {
int k = n.length();
String m = n;
ArrayList<String> minionID = new ArrayList<>();
while (!minionID.contains(m)) {
minionID.add(m);
char[] s = m.toCharArray();
Arrays.sort(s);
int y = Integer.parseInt(toString(s));
int x = Integer.parseInt(reverseString(s));
if (b == 10) {
int intM = x - y;
m = Integer.toString(intM);
} else {
int intM10 = ((int) Integer.parseInt(toBase10(x,b))) - ((int) Integer.parseInt(toBase10(y, b)));
m = toBaseN(intM10, b);
}
m = addLeadingZeros(k, m);
}
System.out.println(minionID);
return minionID.size() - minionID.indexOf(m);
}
private static String toBaseN (int intBase10, int b) {
int residual = intBase10;
ArrayList<String> digitsBaseN = new ArrayList<>();
while (residual >= b) {
int r = residual % b;
digitsBaseN.add(Integer.toString(residual));
residual = (residual - r) / b;
}
digitsBaseN.add(Integer.toString(residual));
StringBuilder reverseDigits = new StringBuilder();
for (int i = digitsBaseN.size() -1; i >= 0; i--) {
reverseDigits.append(digitsBaseN.get(i));
}
return reverseDigits.toString();
}
private static String toBase10 (int intBaseN, int b) {
int[] xArr = new int[Integer.toString(intBaseN).length()];
int count = 0;
for (int i = xArr.length - 1; i >= 0; i--) {
xArr[count] = Integer.toString(intBaseN).charAt(i) - '0';
count++;
}
int yBase10 = 0;
for(int i = 0; i < xArr.length; i++) {
yBase10 += xArr[i] * (Math.pow(b, i));
}
return Integer.toString(yBase10);
}
public static String toString(char[] arr) {
StringBuilder newString = new StringBuilder();
for (char c : arr) {
newString.append(c);
}
if (newString.toString().contains("-")) {
newString.deleteCharAt(0);
}
return newString.toString();
}
public static String reverseString(char[] arr) {
StringBuilder newString = new StringBuilder();
for (int i = arr.length - 1; i >= 0; i--) {
newString.append(arr[i]);
}
if (newString.toString().contains("-")) {
newString.deleteCharAt(newString.length()-1);
}
return newString.toString();
}
public static String addLeadingZeros(int k, String z) {
if (k > z.length()) {
String zeros = "";
for (int i = 0; i < (k - z.length()); i++) {
zeros += "0";
}
zeros += z;
return zeros;
}
return z;
}
It only works for three out of the ten test cases
def answer(n, b):
k = len(n)
m = n
mini_id = []
while m not in mini_id:
mini_id.append(m)
s = sorted(m)
x_descend = ''.join(s[::-1])
y_ascend = ''.join(s)
if b == 10:
int_m = int(x_descend) - int(y_ascend)
m = str(int_m)
else:
int_m_10 = int(to_base_10(x_descend, b)) - int(to_base_10(y_ascend, b))
m = to_base_n(str(int_m_10), b)
m = (k - len(m)) * '0' + m
return len(mini_id) - mini_id.index(m)
We are trying to optimize heavy memory operations in Java and ran into some anomalies. From our data, we concluded the hypothesis, that an array/memory block might be loaded into CPU cache caused by a lot of accesses, but after cloning this array multiple times, the cache becomes full and moves the initial array back into RAM.
To test this, we set up a benchmark. It does the following:
Create an array with a given size
Write some data into the fields
Read/iterate it a million times (to push it into CPU cache)
Clone it once into a new array
Clone the new array into a new array and use the new one for the next time a given amount of times
Additionally, after each of these steps the array is iterated three times and the needed time is measured for each iteration. Here is the code:
private static long[] read(byte[] array, int count, boolean logTimes) {
long[] times = null;
if (logTimes) {
times = new long[count];
}
int sum = 0;
for (int n = 0; n < count; n++) {
long start = System.nanoTime();
for (int i = 0; i < array.length; i++) {
sum += array[i];
}
if (logTimes) {
long time = System.nanoTime() - start;
times[n] = time;
}
}
System.out.println(sum);
return times;
}
public static void main(String[] args) {
int arraySize = Integer.parseInt(args[0]);
int clones = Integer.parseInt(args[1]);
byte[] array = new byte[arraySize];
long[] initialReadTimes = read(array, 3, true);
// Fill with some non-zero content
for (int i = 0; i < array.length; i++) {
array[i] = (byte) i;
}
long[] afterWriteTimes = read(array, 3, true);
// Make this array important, so it lands in CPU Cache
read(array, 1_000_000, false);
long[] afterReadTimes = read(array, 3, true);
long[] afterFirstCloneReadTimes = null;
byte[] copy = new byte[array.length];
System.arraycopy(array, 0, copy, 0, array.length);
for (int i = 1; i <= clones; i++) {
byte[] copy2 = new byte[copy.length];
System.arraycopy(copy, 0, copy2, 0, copy.length);
copy = copy2;
if (i == 1) {
afterFirstCloneReadTimes = read(array, 3, true);
}
}
long[] afterAllClonesReadTimes = read(array, 3, true);
// Write to CSV
...
System.out.println("Finished.");
}
We ran this benchmark with arraysize=10,000 and clones=10,000,000 on a 2nd gen i5 with 16 GB RAM:
There was quite a lot of variation though, the 2nd and 3rd runs had different times sometimes or there were peaks in the 2nd and 3rd run of the last reading benchmark.
These results seem pretty confusing. I think that this could show that upon array initialization, it is not immediately loaded into CPU cache, because the initial read times are relatively high. After writing nothing seems to have changed. Only after iterating a lot the access times become faster, while the first run is always slower (because of the measuring overhead that runs between the readings?). Also cloning/filling memory with new arrays does not seem to have an impact at all. Could anyone explain these results?
We assumed that some of this might stem from java specific memory management, so we tried to reimplement the benchmark in C++:
void read(unsigned char array[], int length, int count, std::vector<long int> & logTimes) {
for (int c = 0; c < count; c++) {
int sum = 0;
std::chrono::high_resolution_clock::time_point t1;
if (count <= 3) {
t1 = std::chrono::high_resolution_clock::now();
}
for (int i = 0; i < length; i++) {
sum += array[i];
}
if (count <= 3) {
std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
long int duration = std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
std::cout << duration << " ns\n";
logTimes.push_back(duration);
}
}
}
int main(int argc, char ** args)
{
int ARRAYSIZE = 10000;
int CLONES = 10000000;
std::vector<long int> initialTimes, afterWritingTimes, afterReadTimes, afterFirstCloneTimes, afterCloneTimes, null;
unsigned char array[ARRAYSIZE];
read(array, ARRAYSIZE, 3, initialTimes);
for (long long i = 0; i < ARRAYSIZE; i++) {
array[i] = i;
}
std::cout << "Reads after writing:\n";
read(array, ARRAYSIZE, 3, afterWritingTimes);
read(array, ARRAYSIZE, 1000000, null);
std::cout << "Reads after 1M Reads:\n";
read(array, ARRAYSIZE, 3, afterReadTimes);
unsigned char copy[ARRAYSIZE];
unsigned char * ptr_copy = copy;
std::memcpy(ptr_copy, array, ARRAYSIZE);
for (long long i = 0; i < CLONES; i++) {
unsigned char copy2[ARRAYSIZE];
std::memcpy(copy2, ptr_copy, ARRAYSIZE);
ptr_copy = copy2;
if (i == 0) {
read(array, ARRAYSIZE, 3, afterFirstCloneTimes);
}
}
std::cout << "Reads after cloning:\n";
read(array, ARRAYSIZE, 3, afterCloneTimes);
writeTimesToCSV(initialTimes, afterWritingTimes, afterReadTimes, afterFirstCloneTimes, afterCloneTimes);
std::cout << "Finished.\n";
}
Using the same parameters, we got the following results:
So in C++ the times are rather similar to each other, with some strange peaks in the 2nd run. This seems to show that above faster timings were caused by java optimizations (or rather suboptimal handling in the first readings). Does this mean that the CPU cache is not involved at all?
I just implemented the two algorithms and I was surprised when I plotted the results! Recursive implementation is clearly faster than the iterative one.
After that, I added the insertion sort combined with both of them and the result was the same.
In the lectures we use to see that recursive is slower that iterative like in factorial calculation but here it doesn't seem to be the case. I pretty sure that my codes are right. What's the explenation for this behaviour? It look like java (10) implements automatically a multithread in recursion mode cause when I display the little animation the insertion sort works in parallel with merge operations.
If these codes are not enough to understand here is my github: Github
EDIT RELOADED
As said in comments I should compare things that are similar so now the merge method is the same in iterative and recursive.
private void merge(ArrayToSort<T> array, T[] sub_array,
int min, int mid, int max) {
//we make a copy of the array.
if (max + 1 - min >= 0) System.arraycopy(array.array, min, sub_array, min, max + 1 - min);
int i = min, j = mid + 1;
for (var k = min; k <= max; k++) {
if (i > mid) {
array.array[k] = sub_array[j++];
} else if (j > max) {
array.array[k] = sub_array[i++];
} else if (sub_array[j].compareTo(sub_array[i]) < 0) {
array.array[k] = sub_array[j++];
} else {
array.array[k] = sub_array[i++];
}
}
}
Sort Recursive:
public void Sort(ArrayToSort<T> array) {
T sub[] = (T[]) new Comparable[array.Length];
sort(array, sub, 0, array.Length - 1);
}
private InsertionSort<T> insertionSort = new InsertionSort<>();
private void sort(ArrayToSort<T> array, T[] sub_array, int min, int max) {
if (max <= min) return;
if (max <= min + 8 - 1) {
insertionSort.Sort(array, min, max);
return;
}
var mid = min + (max - min) / 2;
sort(array, sub_array, min, mid);
sort(array, sub_array, mid + 1, max);
merge(array, sub_array, min, mid, max);
}
Sort Iterative:
private InsertionSort<T> insertionSort = new InsertionSort<>();
public void Sort(ArrayToSort<T> array) {
int length = array.Length;
int maxIndex = length - 1;
T temp[] = (T[]) new Comparable[length];
for (int i = 0; i < maxIndex; i += 8) {
insertionSort.Sort(array, i, Integer.min(i + 8 - 1, maxIndex));
}
System.arraycopy(array.array, 0, temp, 0, length);
for (int m = 8; m <= maxIndex; m = 2 * m) {
for (int i = 0; i < maxIndex; i += 2 * m) {
merge(array, temp, i, i + m - 1,
Integer.min(i + 2 * m - 1, maxIndex));
}
}
}
In the new plot we can see that now the difference is proportional (à un facteur près). If someone has any more ideas... Thanks a lot :)
The new * new plot
And here is my (teacher's one in fact) method to plot:
for (int i = 0; i < nbSteps; i++) {
int N = startingCount + countIncrement * i;
for (ISortingAlgorithm<Integer> algo : algorithms) {
long time = 0;
for (int j = 0; j < folds; j++) {
ArrayToSort<Integer> toSort = new ArrayToSort<>(
ArrayToSort.CreateRandomIntegerArray(N, Integer.MAX_VALUE, (int) System.nanoTime())
);
long startTime = System.currentTimeMillis();
algo.Sort(toSort);
long endTime = System.currentTimeMillis();
time += (endTime - startTime);
assert toSort.isSorted();
}
stringBuilder.append(N + ", " + (time / folds) + ", " + algo.Name() + "\n");
System.out.println(N + ", " + (time / folds) + ", " + algo.Name());
}
}
I don't think I have an answer because I didn't try your code.
I will give you thoughts:
a) CPU have L1 cache and instruction prefetching. The recursive version may have better locality of references when all the sorts are done and it is finishing with a bunch of merges while poping all frames (of for other cpu optimzation reasons)
b) Meanwhile the JIT compiler does crazy things to recursion, particularly due to tail recursion and inlining. I suggest you try without JIT compiler just for fun. Also might want to try changing the thresholds for JIT compilation, so it gets JIT compiled faster for minimizing the warmup time.
c) system.arraycopy is a native method and despite being optimized, it should have overhead.
d) the iterative version seems to have more arithmetics in the loops.
e) that is an attempt at micro-benchmarking. you need to factor out the GC and have tests running dozens if not hundreds of times. Read up on JMH. Also try different GCs and -Xmx.
I'm not very experienced with Rust and I'm trying to diagnose a performance problem. Below there is a pretty fast Java code (runs in 7 seconds) and what I think should the the equivalent Rust code. However, the Rust code runs very slowly (yes, I compiled it with --release as well), and it also appears to overflow. Changing i32 to i64 just pushes the overflow later, but it still happens. I suspect there is some bug in what I wrote, but after staring at the problem for a long time, I decided to ask for help.
public class Blah {
static final int N = 100;
static final int K = 50;
public static void main(String[] args) {
//initialize S
int[] S = new int[N];
for (int n = 1; n <= N; n++) S[n-1] = n*n;
// compute maxsum and minsum
int maxsum = 0;
int minsum = 0;
for (int n = 0; n < K; n++) {
minsum += S[n];
maxsum += S[N-n-1];
}
// initialize x and y
int[][] x = new int[K+1][maxsum+1];
int[][] y = new int[K+1][maxsum+1];
y[0][0] = 1;
// bottom-up DP over n
for (int n = 1; n <= N; n++) {
x[0][0] = 1;
for (int k = 1; k <= K; k++) {
int e = S[n-1];
for (int s = 0; s < e; s++) x[k][s] = y[k][s];
for (int s = 0; s <= maxsum-e; s++) {
x[k][s+e] = y[k-1][s] + y[k][s+e];
}
}
int[][] t = x;
x = y;
y = t;
}
// sum of unique K-subset sums
int sum = 0;
for (int s = minsum; s <= maxsum; s++) {
if (y[K][s] == 1) sum += s;
}
System.out.println(sum);
}
}
extern crate ndarray;
use ndarray::prelude::*;
use std::mem;
fn main() {
let numbers: Vec<i32> = (1..101).map(|x| x * x).collect();
let deg: usize = 50;
let mut min_sum: usize = 0;
for i in 0..deg {
min_sum += numbers[i] as usize;
}
let mut max_sum: usize = 0;
for i in deg..numbers.len() {
max_sum += numbers[i] as usize;
}
// Make an array
let mut x = OwnedArray::from_elem((deg + 1, max_sum + 1), 0i32);
let mut y = OwnedArray::from_elem((deg + 1, max_sum + 1), 0i32);
y[(0, 0)] = 1;
for n in 1..numbers.len() + 1 {
x[(0, 0)] = 1;
println!("Completed step {} out of {}", n, numbers.len());
for k in 1..deg + 1 {
let e = numbers[n - 1] as usize;
for s in 0..e {
x[(k, s)] = y[(k, s)];
}
for s in 0..max_sum - e + 1 {
x[(k, s + e)] = y[(k - 1, s)] + y[(k, s + e)];
}
}
mem::swap(&mut x, &mut y);
}
let mut ans = 0;
for s in min_sum..max_sum + 1 {
if y[(deg, s)] == 1 {
ans += s;
}
}
println!("{}", ans);
}
To diagnose a performance issue in general, I:
Get a baseline time or rate. Preferably create a testcase that only takes a few seconds, as profilers tend to slow down the system a bit. You will also want to iterate frequently.
Compile in release mode with debugging symbols.
Run the code in a profiler. I'm on OS X so my main choice is Instruments, but I also use valgrind.
Find the hottest code path, think about why it's slow, try something, measure.
The last step is the hard part.
In your case, you have a separate implementation that you can use as your baseline. Comparing the two implementations, we can see that your data structures differ. In Java, you are building nested arrays, but in Rust you are using the ndarray crate. I know that crate has a good maintainer, but I personally don't know anything about the internals of it, or what use cases it best fits.
So I rewrote it with using the standard-library Vec.
The other thing I know is that direct array access isn't as fast as using an iterator. This is because array access needs to perform a bounds check, while iterators bake the bounds check into themselves. Many times this means using methods on Iterator.
The other change is to perform bulk data transfer when you can. Instead of copying element-by-element, move whole slices around, using methods like copy_from_slice.
With those changes the code looks like this (apologies for poor variable names, I'm sure you can come up with semantic names for them):
use std::mem;
const N: usize = 100;
const DEGREE: usize = 50;
fn main() {
let numbers: Vec<_> = (1..N+1).map(|v| v*v).collect();
let min_sum = numbers[..DEGREE].iter().fold(0, |a, &v| a + v as usize);
let max_sum = numbers[DEGREE..].iter().fold(0, |a, &v| a + v as usize);
// different data types for x and y!
let mut x = vec![vec![0; max_sum+1]; DEGREE+1];
let mut y = vec![vec![0; max_sum+1]; DEGREE+1];
y[0][0] = 1;
for &e in &numbers {
let e2 = max_sum - e + 1;
let e3 = e + e2;
x[0][0] = 1;
for k in 0..DEGREE {
let current_x = &mut x[k+1];
let prev_y = &y[k];
let current_y = &y[k+1];
// bulk copy
current_x[0..e].copy_from_slice(¤t_y[0..e]);
// more bulk copy
current_x[e..e3].copy_from_slice(&prev_y[0..e2]);
// avoid array index
for (x, y) in current_x[e..e3].iter_mut().zip(¤t_y[e..e3]) {
*x += *y;
}
}
mem::swap(&mut x, &mut y);
}
let sum = y[DEGREE][min_sum..max_sum+1].iter().enumerate().filter(|&(_, &v)| v == 1).fold(0, |a, (i, _)| a + i + min_sum);
println!("{}", sum);
println!("{}", sum == 115039000);
}
2.060s - Rust 1.9.0
2.225s - Java 1.7.0_45-b18
On OS X 10.11.5 with a 2.3 GHz Intel Core i7.
I'm not experienced enough with Java to know what kinds of optimizations it can do automatically.
The biggest potential next step I see is to leverage SIMD instructions when performing the addition; it's pretty much exactly what SIMD is made for.
As pointed out by Eli Friedman, avoiding array indexing by zipping isn't currently the most performant way of doing this.
With the changes below, the time is now 1.267s.
let xx = &mut current_x[e..e3];
xx.copy_from_slice(&prev_y[0..e2]);
let yy = ¤t_y[e..e3];
for i in 0..(e3-e) {
xx[i] += yy[i];
}
This generates assembly that appears to unroll the loop as well as using SIMD instructions:
+0x9b0 movdqu -48(%rsi), %xmm0
+0x9b5 movdqu -48(%rcx), %xmm1
+0x9ba paddd %xmm0, %xmm1
+0x9be movdqu %xmm1, -48(%rsi)
+0x9c3 movdqu -32(%rsi), %xmm0
+0x9c8 movdqu -32(%rcx), %xmm1
+0x9cd paddd %xmm0, %xmm1
+0x9d1 movdqu %xmm1, -32(%rsi)
+0x9d6 movdqu -16(%rsi), %xmm0
+0x9db movdqu -16(%rcx), %xmm1
+0x9e0 paddd %xmm0, %xmm1
+0x9e4 movdqu %xmm1, -16(%rsi)
+0x9e9 movdqu (%rsi), %xmm0
+0x9ed movdqu (%rcx), %xmm1
+0x9f1 paddd %xmm0, %xmm1
+0x9f5 movdqu %xmm1, (%rsi)
+0x9f9 addq $64, %rcx
+0x9fd addq $64, %rsi
+0xa01 addq $-16, %rdx
+0xa05 jne "slow::main+0x9b0"
Recently I made a class Singleton, where was a method returning the object myInstance of class Singleton. It was something like:
private final static Singleton myInstance = new Singleton();
After that I coded entire constructor, which was private, lets say:
private Singleton(){
doStuff()
}
However the performance was terrible. Maybe someone can give me a hint why doStuff() is much slower then when I don't use Singleton? I guess it has something to do with calling the constructor while declaring variables, but can someone share some info about that?
I have no idea why is that, I tried to search for explanation, but I couldn't find it.
Edit: the dostuff function includes stuff like opening files/reading them/using regexp on them, using levenstein function[which by profiler was the slowest part of the code].
When running that levenstein from constructor while using singleton the speed of levenstein function took around 10 seconds. After the object was created the call on this function inside this singleton object took only 0.5 seconds. Now, when not using singleton, calling the levenstein function from the constructor is also 0.5 seconds, prior to 10 seconds when it was done by the singleton. The code for the function is as follows:
["odleglosci" is just a simple map]
private static int getLevenshteinDistance(String s, String t) {
int n = s.length(); // length of s
int m = t.length(); // length of t
int p[] = new int[n + 1]; //'previous' cost array, horizontally
int d[] = new int[n + 1]; // cost array, horizontally
int _d[]; //placeholder to assist in swapping p and d
// indexes into strings s and t
int i; // iterates through s
int j; // iterates through t
char t_j; // jth character of t
int cost; // cost
for (i = 0; i <= n; i++) {
p[i] = i * 2;
}
int add = 2;//how much to add per increase
char[] c = new char[2];
String st;
for (j = 1; j <= m; j++) {
t_j = t.charAt(j - 1);
d[0] = j;
for (i = 1; i <= n; i++) {
cost = s.charAt(i - 1) == t_j ? 0 : Math.min(i, j) > 1 ? (s.charAt(i - 1) == t.charAt(j - 2) ? (s.charAt(i - 2) == t.charAt(j - 1) ? 0 : 1) : 1) : 1;//poprawa w celu zmniejszenia wartosci czeskiego bledu
if (cost == 1) {
c[0] = s.charAt(i - 1);
c[1] = t_j;
st = new String(c);
if (!odleglosci.containsKey(st)) {
//print((int) c[0]);
//print((int) c[1]);
} else if (odleglosci.get(st) > 1) {
cost = 2;
}
} else {
c[0] = s.charAt(i - 1);
c[1] = t_j;
st = new String(c);
if (!odleglosci.containsKey(st)) {
// print((int) c[0]);
// print((int) c[1]);
} else if (odleglosci.get(st) > 1) {
cost = -1;
}
}
d[i] = Math.min(Math.min(d[i - 1] + 2, p[i] + 2), p[i - 1] + cost);
}
_d = p;
p = d;
d = _d;
}
return p[n];
}
I didn't thought that the code in here could have any relevance to the question I asked, that's why I did not include it before, sorry.
The reason it is slow is because doStuff() is slow.