Best way to parallelize this Java code

Best way to parallelize this Java code - java

How would I go about parallelizing this piece of code with the use of Threads in Java? It extracts all the contour from an image and creates a new image with only the image contour.
import java.io.*;
import java.awt.image.*;
import javax.imageio.ImageIO;
import java.awt.Color;
public class Contornos {
static int h, w;
static float debugTime;
public static void main(String[] args) {
try {
File fichImagen = new File("test.jpg");
BufferedImage image = ImageIO.read(fichImagen);
w = image.getWidth();
h = image.getHeight();
int[] inicial = new int[w * h];
int[] resultadoR = new int[w * h];
int[] resultadoG = new int[w * h];
int[] resultadoB = new int[w * h];
int[][] procesarR = new int[h][w];
int[][] procesarG = new int[h][w];
int[][] procesarB = new int[h][w];
int[][] procesarBN = new int[h][w];
int[][] binaria = new int[h][w];
int[] resultado = new int[w * h];
image.getRGB(0, 0, w, h, inicial, 0, w);
for (int i = 0; i < w * h; i++) {
Color c = new Color(inicial[i]);
resultadoR[i] = c.getRed();
resultadoG[i] = c.getGreen();
resultadoB[i] = c.getBlue();
}
int k = 0;
for (int i = 0; i < h; i++) {
for (int j = 0; j < w; j++) {
procesarR[i][j] = resultadoR[k];
procesarG[i][j] = resultadoG[k];
procesarB[i][j] = resultadoB[k];
k++;
}
}
for (int i = 0; i < h; i++) {
for (int j = 0; j < w; j++) {
procesarBN[i][j] = (int) (0.2989 * procesarR[i][j] + 0.5870 * procesarG[i][j] + 0.1140 * procesarB[i][j]);
}
}
binaria = extraerContornos(procesarBN);
k = 0;
for (int i = 0; i < h; i++) {
for (int j = 0; j < w; j++) {
resultado[k++] = binaria[i][j];
}
}
image.setRGB(0, 0, w, h, resultado, 0, w);
ImageIO.write(image, "JPG", new File("allJPG.jpg"));
} catch (IOException e) {
}
}
static void debugStart() {
debugTime = System.nanoTime();
}
static void debugEnd() {
float elapsedTime = System.nanoTime()-debugTime;
System.out.println( (elapsedTime/1000000) + " ms ");
}
private static int[][] extraerContornos(int[][] matriz) {
int modx, mody;
int[][] sobelx = {{-1, 0, 1}, {-2, 0, 2}, {-1, 0, 1}};
int[][] sobely = {{-1, -2, -1}, {0, 0, 0}, {1, 2, 1}};
int[][] modg = new int[h][w];
double[][] theta = new double[h][w];
int[][] thetanor = new int[h][w];
int[][] contorno = new int[h][w];
int umbral = 10;
int superan = 0, ncontorno = 0;
double t;
int signo;
int uno, dos;
for (int i = 0; i < h; i++) {
for (int j = 0; j < w; j++) {
if (i == 0 || i == h - 1 || j == 0 || j == w - 1) {
modg[i][j] = 0;
theta[i][j] = 0.0;
thetanor[i][j] = 0;
} else {
modx = 0;
mody = 0;
for (int k = -1; k <= 1; k++) {
for (int l = -1; l <= 1; l++) {
modx += matriz[i + k][j + l] * sobelx[k + 1][l + 1];
mody += matriz[i + k][j + l] * sobely[k + 1][l + 1];
}
}
modx = modx / 4;
mody = mody / 4;
modg[i][j] = (int) Math.sqrt(modx * modx + mody * mody);
theta[i][j] = Math.atan2(mody, modx);
thetanor[i][j] = (int) (theta[i][j] * 256.0 / (2.0 * Math.PI));
}
}
}
for (int i = 1; i < h - 1; i++) {
for (int j = 1; j < w - 1; j++) {
contorno[i][j] = 0;
if (modg[i][j] >= umbral) {
superan++;
t = Math.tan(theta[i][j]);
if (t >= 0.0) {
signo = 1;
} else {
signo = -1;
}
if (Math.abs(t) < 1.0) {
uno = interpolar(modg[i][j + 1], modg[i - signo][j + 1], t);
dos = interpolar(modg[i][j - 1], modg[i + signo][j - 1], t);
} else {
t = 1 / t;
uno = interpolar(modg[i - 1][j], modg[i - 1][j + signo], t);
dos = interpolar(modg[i + 1][j], modg[i + 1][j - signo], t);
}
if (modg[i][j] > uno && modg[i][j] >= dos) {
ncontorno++;
contorno[i][j] = 255;
}
}
}
}
debugEnd();
return contorno;
}
private static int interpolar(int valor1, int valor2, double tangente) {
return (int) (valor1 + (valor2 - valor1) * Math.abs(tangente));
}
}
I believe I can use Threads in the extraerContornos method (for the for loops), and join() them at the end to get the results, but that's just my guess.
Would that be a correct way to parallelize this? Any tips in general on how to know when and where you should start parallelizing any code?

Tips in general on how to knowwhen and where you should start parallelizing any code?
Well,never ever start parallelizing any code, without having a quantitatively supported evidence, that it will improve system performance.
NEVER EVER,
even if any academicians or wannabe gurus tell you to do so.
First collect a fair amount of evidence, that it has any sense at all and how big will be a positive edge such code re-engineering will bring, over an original, pure-[SERIAL], code-execution flow.
It is like in nature or like in business -- who will ever pay a single cent more for getting a same result?
Who will pay X-[man*hours] work at current salary rates for getting just the first 1.01x improvement in performance ( not speaking about wannabe-parallel-gangstas, who manage to deliver even worse than original performance ... because of un-seen before hidden costs of add-on overheads ) -- who will ever pay for this?
How to start to analyse possible benefits v/s negative impacts?
First of all, try to understand the "mechanics", how can the layered, composite system -- consisting of [ O/S-kernel, programming language, user program ] -- orchestrate going forward using either a "just"-[CONCURRENT] or true-[PARALLEL] process-scheduling.
Without knowing this, one can never quantify the actual costs of the entry, and sometimes people even pay all such costs without ever realising, that the resulting processing-flow is yet never even at least a "just"-[CONCURRENT] processing ( if one forgets to understand a central "concurrency-preventing-by-exclusive-LOCK-ing" blocking of a python GIL-locking, which could well help mask some sorts of I/O-latencies, but never indeed any kind of improving of a CPU-bound processing-performance, yet all have to pay all those immense costs of spawning full-copies of the process execution-environment + python-internal-state -- all that for receiving nothing at the end. NOTHING. Yes, that bad may things go, if poor or missing knowledge preceded a naive attempt to "go parallelize" activism ).
Ok, once you feel comfortable in operating-system "mechanics" available for spawning threads and processes, you can guesstimate or better benchmark the costs of doing that -- to start working quantitatively -- knowing how many [ns] one will have to pay to spawn a first, second, ... thirtyninth child thread or separate O/S process, or what will be the add-on costs for using some higher-level language constructor, that fans-out a herd of threads/processes, distributes some amount of work and finally collects the heaps of results back to the original requestor ( using just the high-level syntax of .map(...){...}, .foreach(...){...} et al, which on their lower ends do all the dirty job just hidden from the sight of the user-programme designer ( not speaking about "just"-coders, who even do not try to spend any but zero efforts on a fully responsible understanding of the "mechanics" + "economy" of costs of their "just"-coded work ) ).
Without knowing the actual costs in [ns] ( technically not depicted for clarity and brevity in Fig.1, that are principally always present, being detailed and discussed in the trailer sections ), it makes almost no sense for anyone to try to read and to try to understand in its full depth and its code-design context the criticism of the Amdahl's Law
It is so easy to pay more than one will receive at the end ...
For more details on this risk, check this and follow the link from the first paragraph, leading to a fully interactive GUI-simulator of the actual costs of overheads, once introduced into the costs/benefits formula.
Back to your code:
Sobel-filter kernel introduces ( naive-)-thread-mapping non-local dependencies, better to start with a way simple section, where an absolute independence is straight visible:
May save all the repetitive for(){...}-constructor overheads and increase performance:
for ( int i = 0; i < h; i++ ) {
for ( int j = 0; j < w; j++ ) {
Color c = new Color( inicial[i * w + j] );
procesarBN[i][j] = (int) ( 0.2989 * c.getRed()
+ 0.5870 * c.getGreen()
+ 0.1140 * c.getBlue()
);
}
}
Instead of these triple-for(){...}-s:
for (int i = 0; i < w * h; i++) {
Color c = new Color(inicial[i]);
resultadoR[i] = c.getRed();
resultadoG[i] = c.getGreen();
resultadoB[i] = c.getBlue();
}
int k = 0;
for (int i = 0; i < h; i++) {
for (int j = 0; j < w; j++) {
procesarR[i][j] = resultadoR[k];
procesarG[i][j] = resultadoG[k];
procesarB[i][j] = resultadoB[k];
k++;
}
}
for (int i = 0; i < h; i++) {
for (int j = 0; j < w; j++) {
procesarBN[i][j] = (int) (0.2989 * procesarR[i][j] + 0.5870 * procesarG[i][j] + 0.1140 * procesarB[i][j]);
}
}
Effects?
In the [SERIAL]-part of the Amdahl's Law:
at net zero add-on costs: improved / eliminated 2/3 of the for(){...}-constructor looping overhead costs
at net zero add-on costs: improved / eliminated the ( 4 * h * w * 3 )-memIO ( i.e. not paying ~ h * w * 1.320+ [us] each !!! )
at net zero add-on costs: improved / eliminated the ( 4 * h * w * 3 * 4 )-memALLOCs, again saving remarkable amount of resources both in [TIME] and [SPACE], polynomially scaled domains of the complexity ZOO taxonomy.
and also may feel safe to run these in a [CONCURRENT] processing, as this pixel-value processing is principally independent here ( but not in the Sobel, not in the contour-detector algorithm ).
So, here,
any [CONCURRENT] or [PARALLEL] process-scheduling may help, if
at some non-zero add-on cost, the processing gets harnessing multiple computing resources ( more than the 1 CPU-core, that was operated in the original, pure-[SERIAL], code-execution ), will have been safely pixel-grid mapped onto such ( available resources-supported ) thread-pool or other code-processing facility.
Yet,
any attempt to go non-[SERIAL] makes sense if and only if the lumpsum of all the process-allocation / deallocation et al add-on costs get at least justified by an increased amount of [CONCURRENT]-ly processed calculations.
Paying more than receiving is definitely not a smart move...
So, benchmark, benchmark and benchmark, before deciding what may get positive effect on production code.
Always try to get improvements in the pure-[SERIAL] sections, as these have zero-add-on costs and yet may reduce the overall processing time.
Q.E.D. above.

Related

Advanced 2-D Array map generation for 2D Platformer game with Java (processing)

I'm trying to figure out a way to make my simple map generation better. So the question to be specific is, how do I make my random 2-D array platform generator, less random so it looks more like a platformer game level as well as adding a bit more variables to it for adding variation to each Map Generated. Explanation below.
Currently, I have this:
void GenerateMap() {
int x = width/30;
int y = height/28;
int[][] map = new int[x][y];
//println(map.length);
for (int i = 0; i < map.length; i++) {
for (int j = 0; j < map[i].length; j++) {
float val = random(0, 100);
float nv = noise(val);
println(nv);
if (j <= (int) (.5 * map.length)) {
map[i][j] = nv < .3 ? 0 : 1;
} else if (j >= (int) (.7 * map.length)) {
map[i][j] = nv < .6 ? 1 : 0;
} else {
map[i][j] = nv <= .3 ? 0 : 1;
}
println(i +"-" + j + " - rowcol: " + map[i][j]);
}
}
levelMap = map;
JSONArray saveArr = arrayToJson(levelMap);
saveJSONArray(saveArr, dataPath("./maps/Generation_" + mapGenerations + ".json"), "compact");
}
It builds a 2d array with 1s and 0s, I tried to make it less random so it looks like a playable terrain. the top-most section will be 0's (air), the bottom-most will be 1's (platform). Then the middle will have a mix to make floating platforms and such. So far I've got nothing.
The next Idea that popped up was to have the number go from 0-N. This works by:
0 will always be air,
1 will always be a platform,
2 can be a platform of a different image, let's say a diagonal platform to form a slope off the edge
3 can be a collectible or anything
...etc
My issue is I can't figure out how to generate the numbers in a way that will make the map not look like a mess. I came up with a setup for my variables:
public int[] toplayer = null, // [0,0,0,0,...,0]
bottomlayer = null, // [1,1,1,1,...,1]
variation = {0,1,2}, // the 0-N
variationRatios = {60, 30, 10}; // this would specify the frequency/probability of each variation show up, respectively, in a random() function
After all, that's done, then I run a function to replace all values with their respective "Sprite" and then display them on the screen.
void drawPlatform(int amt, PVector pos, PVector size) {
for (int i = 0; i < amt; i++) {
//new Platform(new PVector(abs(width-((pos.x+(size.x/2)) + (i * size.x))), pos.y+(size.y/2)), size);
Platform p = new Platform(new PVector(abs(((pos.x+(size.x/2)) + (i * size.x))), pos.y+(size.y/2)), size);
if (platforms.indexOf(p) % 5 == 0 && !p.bouncy) {
p.bouncy = true;
}
}
}
JSONArray arrayToJson(int[][] arr) {
JSONArray ja = new JSONArray();
for (int[] ar : arr) {
JSONArray j = new JSONArray();
ja.append(j);
for (int a : ar) {
j.append(a);
}
}
return ja;
}
void LoadMap(int[][] map) {
drawPlatform(width/30, new PVector(0, 0), new PVector(30, 30)); // top
for (int i = 0; i < map.length; i++) {
for (int j = 0; j < map[i].length; j++) {
if (map[i][j] == 0) {
new Platform(new PVector((i*30)+15, (j*30)+45), new PVector(30, 30));
//drawPlatform(1, new PVector(i+j*30, (i+j)*30), new PVector(30, 30));
}
}
}
drawPlatform(width/30, new PVector(0, height-30), new PVector(30, 30)); // bottom
println(gameObjects.size() + " GameObjects loaded");
println(platforms.size() + " Platforms loaded");
}
Overall, this is how it's used:
GenerateMap();
LoadMap(levelMap);
mapGenerations++;
This is how it would end up looking (obviously different each generation):
But right now, I have something like this:
I've looked everywhere for a couple of days now, for a solution and can't find anything as specific as my request. If you have any ideas, please let me know.
Quick Note: Of course, we could have the user write the array by hand but that's not time-effective enough. I could also have a visual map builder but that's a lot more code and more time than I have available. I am not opposed to this idea, however.

use multi-threading to generate a matrix in java to utilize all CPU cores of a supercomputer

i am working on a problem which deals with large amount of data and computation also for all that purpose we have a supercomputer with processing power of 270 T FLOPS so our data is basically in matrix form so we decided to divide the generation of matrix into several parts by using threads so problem is there only that how can we implement such thing in our function we are just using arguments to divide task but run function of thread is not taking arguments
static int start=0,end;
int row=10000;
int col =10000,count=0,m=2203,n=401;
double p=Math.PI,t=Math.sqrt(3),pi=(p/(col+1));
double mul,a,b,c,d,e,f,xx,yy;
int[][] matrix = new int [row][col];
private void sin1()
{
// TODO Auto-generated method
for (int i =start; i < row; i++)
{
for (int j = 0; j <col; j++)
{
xx=((i+1)*pi);
yy=((j+1)*pi);
a=Math.cos(((2*m)-n)*((2*(xx))/3));
b=Math.sin(((2*(n*(yy)))/t));
c=Math.cos(((((2*n)-m)*(2*(xx)))/3));
d=Math.sin(((2*m)*(yy)/t));
e=Math.cos((((m+n)*(2*(xx)))/3));
f=Math.sin((((m-n)*(2*(yy)))/t));
mul=(a*b)-(c*d)+(e*f);
if(mul<0)
{
matrix[i][j]=0;
}
else
{
matrix[i][j]=1;
}
System.out.print(matrix[i][j]);
}
System.out.println();
}
}
we at first testing it for 10 million values

The code makes it clear to me that you lack any form of Java programming knowledge. This is a bad thing if you want to write code for a super computer. Java luckily has a good set of tools to solve all kinds of problems, but you need to know which tools to use for which situation.
In your case you can i.E. use parallel streams to spread the generation across cores like this:
static final int start = 0;
static int end;
static final int row = 10000;
static final int col = 10000, count = 0, m = 2203, n = 401;
static final double t = Math.sqrt(3);
static final double pi = (Math.PI / (col + 1));
final int[][] matrix = new int[row][col];
public int generateMatrixEntry(final int i, final int j) {
final double xx = ((i + 1) * pi);
final double yy = ((j + 1) * pi);
final double a = Math.cos(((2 * m) - n) * ((2 * (xx)) / 3));
final double b = Math.sin(((2 * (n * (yy))) / t));
final double c = Math.cos(((((2 * n) - m) * (2 * (xx))) / 3));
final double d = Math.sin(((2 * m) * (yy) / t));
final double e = Math.cos((((m + n) * (2 * (xx))) / 3));
final double f = Math.sin((((m - n) * (2 * (yy))) / t));
final double mul = (a * b) - (c * d) + (e * f);
return (mul < 0) ? 0 : 1;
}
private void sin1() {
IntStream.range(start, row).parallel().forEach((i) -> {
for (int j = 0; j < col; j++) {
matrix[i][j] = generateMatrixEntry(i, j);
}
});
}
This is however just one possible solution that might or might not fit your hardware. You absolutely need someone with deeper Java knowledge to select the right tools from the set for you if the above does not solve your issues.

Diagnosing a performance issue

I'm not very experienced with Rust and I'm trying to diagnose a performance problem. Below there is a pretty fast Java code (runs in 7 seconds) and what I think should the the equivalent Rust code. However, the Rust code runs very slowly (yes, I compiled it with --release as well), and it also appears to overflow. Changing i32 to i64 just pushes the overflow later, but it still happens. I suspect there is some bug in what I wrote, but after staring at the problem for a long time, I decided to ask for help.
public class Blah {
static final int N = 100;
static final int K = 50;
public static void main(String[] args) {
//initialize S
int[] S = new int[N];
for (int n = 1; n <= N; n++) S[n-1] = n*n;
// compute maxsum and minsum
int maxsum = 0;
int minsum = 0;
for (int n = 0; n < K; n++) {
minsum += S[n];
maxsum += S[N-n-1];
}
// initialize x and y
int[][] x = new int[K+1][maxsum+1];
int[][] y = new int[K+1][maxsum+1];
y[0][0] = 1;
// bottom-up DP over n
for (int n = 1; n <= N; n++) {
x[0][0] = 1;
for (int k = 1; k <= K; k++) {
int e = S[n-1];
for (int s = 0; s < e; s++) x[k][s] = y[k][s];
for (int s = 0; s <= maxsum-e; s++) {
x[k][s+e] = y[k-1][s] + y[k][s+e];
}
}
int[][] t = x;
x = y;
y = t;
}
// sum of unique K-subset sums
int sum = 0;
for (int s = minsum; s <= maxsum; s++) {
if (y[K][s] == 1) sum += s;
}
System.out.println(sum);
}
}
extern crate ndarray;
use ndarray::prelude::*;
use std::mem;
fn main() {
let numbers: Vec<i32> = (1..101).map(|x| x * x).collect();
let deg: usize = 50;
let mut min_sum: usize = 0;
for i in 0..deg {
min_sum += numbers[i] as usize;
}
let mut max_sum: usize = 0;
for i in deg..numbers.len() {
max_sum += numbers[i] as usize;
}
// Make an array
let mut x = OwnedArray::from_elem((deg + 1, max_sum + 1), 0i32);
let mut y = OwnedArray::from_elem((deg + 1, max_sum + 1), 0i32);
y[(0, 0)] = 1;
for n in 1..numbers.len() + 1 {
x[(0, 0)] = 1;
println!("Completed step {} out of {}", n, numbers.len());
for k in 1..deg + 1 {
let e = numbers[n - 1] as usize;
for s in 0..e {
x[(k, s)] = y[(k, s)];
}
for s in 0..max_sum - e + 1 {
x[(k, s + e)] = y[(k - 1, s)] + y[(k, s + e)];
}
}
mem::swap(&mut x, &mut y);
}
let mut ans = 0;
for s in min_sum..max_sum + 1 {
if y[(deg, s)] == 1 {
ans += s;
}
}
println!("{}", ans);
}

To diagnose a performance issue in general, I:
Get a baseline time or rate. Preferably create a testcase that only takes a few seconds, as profilers tend to slow down the system a bit. You will also want to iterate frequently.
Compile in release mode with debugging symbols.
Run the code in a profiler. I'm on OS X so my main choice is Instruments, but I also use valgrind.
Find the hottest code path, think about why it's slow, try something, measure.
The last step is the hard part.
In your case, you have a separate implementation that you can use as your baseline. Comparing the two implementations, we can see that your data structures differ. In Java, you are building nested arrays, but in Rust you are using the ndarray crate. I know that crate has a good maintainer, but I personally don't know anything about the internals of it, or what use cases it best fits.
So I rewrote it with using the standard-library Vec.
The other thing I know is that direct array access isn't as fast as using an iterator. This is because array access needs to perform a bounds check, while iterators bake the bounds check into themselves. Many times this means using methods on Iterator.
The other change is to perform bulk data transfer when you can. Instead of copying element-by-element, move whole slices around, using methods like copy_from_slice.
With those changes the code looks like this (apologies for poor variable names, I'm sure you can come up with semantic names for them):
use std::mem;
const N: usize = 100;
const DEGREE: usize = 50;
fn main() {
let numbers: Vec<_> = (1..N+1).map(|v| v*v).collect();
let min_sum = numbers[..DEGREE].iter().fold(0, |a, &v| a + v as usize);
let max_sum = numbers[DEGREE..].iter().fold(0, |a, &v| a + v as usize);
// different data types for x and y!
let mut x = vec![vec![0; max_sum+1]; DEGREE+1];
let mut y = vec![vec![0; max_sum+1]; DEGREE+1];
y[0][0] = 1;
for &e in &numbers {
let e2 = max_sum - e + 1;
let e3 = e + e2;
x[0][0] = 1;
for k in 0..DEGREE {
let current_x = &mut x[k+1];
let prev_y = &y[k];
let current_y = &y[k+1];
// bulk copy
current_x[0..e].copy_from_slice(&current_y[0..e]);
// more bulk copy
current_x[e..e3].copy_from_slice(&prev_y[0..e2]);
// avoid array index
for (x, y) in current_x[e..e3].iter_mut().zip(&current_y[e..e3]) {
*x += *y;
}
}
mem::swap(&mut x, &mut y);
}
let sum = y[DEGREE][min_sum..max_sum+1].iter().enumerate().filter(|&(_, &v)| v == 1).fold(0, |a, (i, _)| a + i + min_sum);
println!("{}", sum);
println!("{}", sum == 115039000);
}
2.060s - Rust 1.9.0
2.225s - Java 1.7.0_45-b18
On OS X 10.11.5 with a 2.3 GHz Intel Core i7.
I'm not experienced enough with Java to know what kinds of optimizations it can do automatically.
The biggest potential next step I see is to leverage SIMD instructions when performing the addition; it's pretty much exactly what SIMD is made for.
As pointed out by Eli Friedman, avoiding array indexing by zipping isn't currently the most performant way of doing this.
With the changes below, the time is now 1.267s.
let xx = &mut current_x[e..e3];
xx.copy_from_slice(&prev_y[0..e2]);
let yy = &current_y[e..e3];
for i in 0..(e3-e) {
xx[i] += yy[i];
}
This generates assembly that appears to unroll the loop as well as using SIMD instructions:
+0x9b0 movdqu -48(%rsi), %xmm0
+0x9b5 movdqu -48(%rcx), %xmm1
+0x9ba paddd %xmm0, %xmm1
+0x9be movdqu %xmm1, -48(%rsi)
+0x9c3 movdqu -32(%rsi), %xmm0
+0x9c8 movdqu -32(%rcx), %xmm1
+0x9cd paddd %xmm0, %xmm1
+0x9d1 movdqu %xmm1, -32(%rsi)
+0x9d6 movdqu -16(%rsi), %xmm0
+0x9db movdqu -16(%rcx), %xmm1
+0x9e0 paddd %xmm0, %xmm1
+0x9e4 movdqu %xmm1, -16(%rsi)
+0x9e9 movdqu (%rsi), %xmm0
+0x9ed movdqu (%rcx), %xmm1
+0x9f1 paddd %xmm0, %xmm1
+0x9f5 movdqu %xmm1, (%rsi)
+0x9f9 addq $64, %rcx
+0x9fd addq $64, %rsi
+0xa01 addq $-16, %rdx
+0xa05 jne "slow::main+0x9b0"

Newton's Method for finding Complex Roots in Java

I got a project in my Java class which I'm having trouble with.
The project is basically marking coordinates on the screen, making a (complex) polynomial out of them, then solving the polynomial with Newton's method using random guesses and drawing the path of the guesses on the screen.
I don't have a problem with any of the drawing, marking, etc.
But for some reason, my Newton's method algorithm randomly misses roots. Sometimes it hits none of them, sometimes it misses one or two. I've been changing stuff up for hours now but I couldn't really come up with a solution.
When a root is missed, usually the value I get in the array is either converging to infinity or negative infinity (very high numbers)
Any help would be really appreciated.
> // Polynomial evaluation method.
public Complex evalPoly(Complex complexArray[], Complex guess) {
Complex result = new Complex(0, 0);
for (int i = 0; i < complexArray.length; i++) {
result = result.gaussMult(guess).addComplex(complexArray[complexArray.length - i - 1]);
}
return result;
}
> // Polynomial differentation method.
public Complex[] diff(Complex[] comp) {
Complex[] result = new Complex[comp.length - 1];
for (int j = 0; j < result.length; j++) {
result[j] = new Complex(0, 0);
}
for (int i = 0; i < result.length - 1; i++) {
result[i].real = comp[i + 1].real * (i + 1);
result[i].imaginary = comp[i + 1].imaginary * (i + 1);
}
return result;
}
> // Method which eliminates some of the things that I don't want to go into the array
public boolean rootCheck2(Complex[] comps, Complex comp) {
double accLim = 0.01;
if (comp.real == Double.NaN)
return false;
if (comp.real == Double.NEGATIVE_INFINITY || comp.real == Double.POSITIVE_INFINITY)
return false;
if (comp.imaginary == Double.NaN)
return false;
if (comp.imaginary == Double.NEGATIVE_INFINITY || comp.imaginary == Double.POSITIVE_INFINITY)
return false;
for (int i = 0; i < comps.length; i++) {
if (Math.abs(comp.real - comps[i].real) < accLim && Math.abs(comp.imaginary - comps[i].imaginary) < accLim)
return false;
}
return true;
}
> // Method which finds (or attempts) to find all of the roots
public Complex[] addUnique2(Complex[] poly, Bitmap bitmapx, Paint paint, Canvas canvasx) {
Complex[] rootsC = new Complex[poly.length - 1];
int iterCount = 0;
int iteLim = 20000;
for (int i = 0; i < rootsC.length; i++) {
rootsC[i] = new Complex(0, 0);
}
while (iterCount < iteLim && MainActivity.a < rootsC.length) {
double guess = -492 + 984 * rand.nextDouble();
double guess2 = -718 + 1436 * rand.nextDouble();
if (rootCheck2(rootsC, findRoot2(poly, new Complex(guess, guess2), bitmapx, paint, canvasx))) {
rootsC[MainActivity.a] = findRoot2(poly, new Complex(guess, guess2), bitmapx, paint, canvasx);
MainActivity.a = MainActivity.a + 1;
}
iterCount = iterCount + 1;
}
return rootsC;
}
> // Method which finds a single root of the complex polynomial.
public Complex findRoot2(Complex[] comp, Complex guess, Bitmap bitmapx, Paint paint, Canvas canvasx) {
int iterCount = 0;
double accLim = 0.001;
int itLim = 20000;
Complex[] diffedComplex = diff(comp);
while (Math.abs(evalPoly(comp, guess).real) >= accLim && Math.abs(evalPoly(comp, guess).imaginary) >= accLim) {
if (iterCount >= itLim) {
return new Complex(Double.NaN, Double.NaN);
}
if (evalPoly(diffedComplex, guess).real == 0 || evalPoly(diffedComplex, guess).imaginary == 0) {
return new Complex(Double.NaN, Double.NaN);
}
iterCount = iterCount + 1;
guess.real = guess.subtractComplex(evalPoly(comp, guess).divideComplex(evalPoly(diffedComplex, guess))).real;
guess.imaginary = guess.subtractComplex(evalPoly(comp, guess).divideComplex(evalPoly(diffedComplex, guess))).imaginary;
drawCircles((float) guess.real, (float) guess.imaginary, paint, canvasx, bitmapx);
}
return guess;
}
> // Drawing method
void drawCircles(float x, float y, Paint paint, Canvas canvasx, Bitmap bitmapx) {
canvasx.drawCircle(x + 492, shiftBackY(y), 5, paint);
coordPlane.setAdjustViewBounds(false);
coordPlane.setImageBitmap(bitmapx);
}
}

Error 1
The lines
guess.real = guess.subtractComplex(evalPoly(comp, guess).divideComplex(evalPoly(diffedComplex, guess))).real;
guess.imaginary = guess.subtractComplex(evalPoly(comp, guess).divideComplex(evalPoly(diffedComplex, guess))).imaginary;
first introduce a needless complication and second introduce an error that makes it deviate from Newton's method. The guess used in the second line is different from the guess used in the first line since the real part has changed.
Why do you not use, like in the evaluation procedure, the complex assignment in
guess = guess.subtractComplex(evalPoly(comp, guess).divideComplex(evalPoly(diffedComplex, guess)));
Error 2 (Update)
In the computation of the differentiated polynomial, you are missing the highest degree term in
for (int i = 0; i < result.length - 1; i++) {
result[i].real = comp[i + 1].real * (i + 1);
result[i].imaginary = comp[i + 1].imaginary * (i + 1);
It should be either i < result.length or i < comp.length - 1. Using the wrong derivative will of course lead to unpredictable results in the iteration.
On root bounds and initial values
To each polynomial you can assign an outer root bound such as
R = 1+max(abs(c[0:N-1]))/abs(c[N])
Using 3*N points, random or equidistant, on or close to this circle should increase the probability to reach each of the roots.
But the usual way to find all of the roots is to use polynomial deflation, that is, splitting off the linear factors corresponding to the root approximations already found. Then a couple of additional Newton steps using the full polynomial restores maximal accuracy.
Newton fractals
Each root has a basin or domain of attraction with fractal boundaries between the domains. In rebuilding a similar situation to the one used in
I computed a Newton fractal showing that the attraction to two of the roots and ignorance of the other two is a feature of the mathematics behind it, not an error in implementing the Newton method.
Different shades of the same color belong to the domain of the same root where brightness corresponds to the number of steps used to reach the white areas around the roots.

Trying to port python code to Java but getting different results

I'm sure I'm making a rookie mistake with java(this is actually my first program). I am trying to port some working python code I made into java(as a learning/testing exercise to learn a bit of the differences) but I'm getting different results between the two.
My program takes a list of data and generates another list based on it(basically sees if a value can be broken down by a sum). Python correctly gives 2,578 results while Java only gives 12. I tried to find the same commands in java and thought I did but can't seem to figure out why the results differ(the difference between the two I have experienced problems with multi threading and syncing of variables, wasn't sure if Java was doing anything behind the scenes so I had a while loop to keep running until the results stabilize, but it didn't help). Any suggestions would be helpful.
Here's the offending code(java at the top, python and pseudo code commented out in the bottom as reference):
for (int c = 0; c <= max_value; c++){
String temp_result = (s - c * data.get(i) + "," + i);
if( results.contains( temp_result ) ){
String result_to_add = (s + "," + i+1);
if( results.contains( result_to_add ) ){
System.out.println("contains result already");
} else {
results.add(result_to_add);
} print len(T)
#Here's the basic pseudo code(I added a few control variables but here's a high level view):
for i = 1 to k
for z = 0 to sum:
for c = 1 to z / x_i:
if T[z - c * x_i][i - 1] is true:
set T[z][i] to true
*/

In java s + "," + i+1 is a String concatenation : "10" + "," + 4 + 1 will return 10,41.
Use String result_to_add = s + "," + (i+1); instead.

I see you've solved it just now, but since I've written it already, here's my version:
This uses the trick of using a Point as a substitute for a 2-element Python list/tuple of int, which (coincidentally) bypasses your String concatenation issue.
public class Sums
{
public static void main(String[] args)
{
List T = new ArrayList();
T.add(new Point(0, 0));
int target_sum = 100;
int[] data = new int[] { 10, -2, 5, 50, 20, 25, 40 };
float max_percent = 1;
int R = (int) (target_sum * max_percent * data.length);
for (int i = 0; i < data.length; i++)
{
for (int s = -R; s < R + 1; s++)
{
int max_value = (int) Math.abs((target_sum * max_percent)
/ data[i]);
for (int c = 0; c < max_value + 1; c++)
{
if (T.contains(new Point(s - c * data[i], i)))
{
Point p = new Point(s, i + 1);
if (!T.contains(p))
{
T.add(p);
}
}
}
}
}
System.out.println(T.size());
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best way to parallelize this Java code - java

Related

Advanced 2-D Array map generation for 2D Platformer game with Java (processing)

use multi-threading to generate a matrix in java to utilize all CPU cores of a supercomputer

Diagnosing a performance issue

Newton's Method for finding Complex Roots in Java

Trying to port python code to Java but getting different results

Categories

Resources