Implementing Adaboost for multiple dimensions in Java

Implementing Adaboost for multiple dimensions in Java - java

I'm working on AdaBoost implementation in Java.
It should have work for "double" coordinates on 2D 3D or 10D.
All I found for Java is for a binary data (0,1) and not for multi-dimensional space.
I'm currently looking for a way to represent the dimensions and to initialize the classifiers for boosting.
I'm looking for suggestions on how to represent the multidimensional space in Java, and how to initialize the classifiers to begin with.
The data is something in between [-15,+15]. And the target values are 1 or 2.

To use a boosted decision tree on spatial data, the typical approach is to try to find a "partition point" on some axis that minimizes the residual information in the two subtrees. To do this, you find some value along some axis (say, the x axis) and then split the data points into two groups - one group of points whose x coordinate is below that split point, and one group of points whose x coordinate is above that split point. That way, you convert the real-valued spatial data into 0/1 data - the 0 values are the ones below the split point, and the 1 values are the ones above the split point. The algorithm is thus identical to AdaBoost, except that when choosing the axis to split on, you also have to consider potential splitting points.

How about using JBoost, I think it's got what you're looking for.

Why don't you use a double[] array for each object? That is the common way of representing feature vectors in Java.

Related

Calculating the principal axis/eigenvalues and -vectors of large dataset in Java

I have a large dataset (>500.000 elements) that contains the stress values (σ_xx, σ_yy, σ_zz, τ_xy, τ_yz, τ_xz) of FEM-Elements. These stress values are given in the global xyz-coordinate space of the model. I want to calculate the main axis stress values and directions from those. If you're not that familiar with the physics behind it, this means taking the symmetric matrix
| σ_xx τ_xy τ_xz |
| τ_xy σ_yy τ_yz |
| τ_xz τ_yz σ_zz |
and calculating its eigenvalues and eigenvectors. Calculating each set of eigenvalues and -vectors on its own is too slow. I'm looking for a library, an algorithm or something in Java that would allow me to do this as array calculations. As an example, in python/numpy I could just take all my 3x3-matrices, stack them along a third dimension to get a nx3x3-array, and pass that to np.linalg.eig(arr), and it automatically gives me an nx3-array for the three eigenvalues and an nx3x3-array for the three eigenvectors.
Things I tried:
nd4j has an Eigen-module for calculating eigenvalues and -vectors, but only supports a single square array at a time.
Calculate the characteristic polynomial and use cardanos formula to get the roots/eigenvalues - possible to do for the whole array at once, but I'm stuck now on how to get the corresponding eigenvectors. Is there maybe a general simple algorithm to get from those to the eigenvectors?
Looking for an analytical form of the eigenvalues and -vectors that can be calculated directly: It does exist, but just no.

You'll need to write a little code.
I'd create or use a Matrix class as a dependency and find methods to give you eigenvalues and eigenvectors. The ones you found in nd4j sound like great candidates. You might also consider the Linear Algebra For Java (LA4J) dependency.
Load the dataset into a List<Matrix>.
Use functional Java methods to apply a map to give you a List of eigenvalues as a vector per stress matrix and a List of eigenvectors as a matrix per stress matrix.
You can optimize this calculation to the greatest extent possible by applying the map function to a stream. Java will parallelize the calculation under the covers to leverage available cores to the greatest extent possible.

Follow-up: This is the way that worked best for me, as I can do all operations without iterating over every element. As stated above, I'm using Nd4j, which seems to be limited in its possibilities compared to numpy (or maybe I just didn't read the documentation thoroughly enough). The following method uses only basic array operations:
From the given stress values, calculate the eigenvalues using Cardano's formula. Only element wise instructions are needed to do that (add, sub, mul, div, pow). The result should be three vectors of size n, each containing one eigenvalue for all elements.
Use the formula given here to calculate the matrix S for each eigenvalue. Like step 1, this can obviously also be done using only element-wise operations with the stress value- and eigenvalue-vectors, in order to avoid specifiying some complicated instructions on which array to multiply according to which axis while keeping whatever other axis.
Take one column from S and normalize it to get a normalized eigenvector for the given eigenvalue.
Note that this method only works if you have a real symmetric matrix. You also should make sure to properly deal with cases where the same eigenvalue appears multiple times.

Plotting points from two arrays to a graph

I was solving tsp using simulated annealing and I want to plot the optimum distance versus temperature of the optimum distance and join those points to see the nature of the graph.
I've got the distances and temperatures in two different arrays now I need to plot this as a scatter. If putting the values in one array is required for plotting even that can be done but how do I plot such a graph. I tried using libre office to plot the graph but that isn't working at all, the app is crashing.
while (temp > 1) {
//some code giving distance
// Cool system
temp *= 1-coolingRate;
System.out.println(""+ best.getDistance());
System.out.println(""+ temp );
//Outputs to be put in an array and plotted
}
Edit 1:
Both the arrays are single dimensional and the graph I want to plot has points whose X and y co ordinates are taken from these arrays consecutively.
I can't figure out a way to do it.

Search your code to see if 'temp' is defined in multiple ways in your code. temp is a common variable that is also often used as a temporary holding place while you crunch the numbers. Has it been previously defined?

Difference between RowMatrix and Matrix in Apache Spark?

I want to know the basic difference between RowMatrix and Matrix class available in Apache Spark.

A little bit more precise question here would be what is a difference between mllib.linalg.Matrix and mllib.linalg.distributed.DistributedMatrix.
Matrix is a trait which represents local matrices which reside in a memory of a single machine. For now there are two basic implementations: DenseMatrix and SparseMatrix.
DistributedMatrix is a trait which represents distributed matrices build on top of RDD. RowMatrix is a subclass of a DistributedMatrix which stores data in a row-wise manner without meaningful row ordering. There are other implementations of DistributedMatrix (like IndexedRowMatrix, CoordinateMatrix and BlockMatrix) each with its own storage strategy and specific set of methods. See for example Matrix Multiplication in Apache Spark

This is going to come down a little to the idioms of the language / framework / discipline you're using, but in computer science, an array is a one dimensional "list" of "things" that can be referenced by their position in the list. One of the things that can be in the list is another array which let you make arrays of arrays (of arrays of arrays ...) giving you a data set arbitrarily large dimension.
A matrix comes from linear algebra and is a two dimensional representation of data (which can be represented by an array of arrays) that comes with a powerful set of mathematical operations that allows you to manipulate the data in interesting ways. While arrays can vary in size, the width and height of a matrix is generally know based on the specific type of operations you're going to perform.
Matrixes are used extensively in 3d graphics and physics engines because they are a fast, convenient way of representing transformation and acceleration data for objects in three dimensions.
Array : Collection of homogeneous elements.
Matrix : A simple row and column thing.
Both are different things in different spaces.
But in computer programming, a collection of single dimensions array can be termed as matrix.
You can represent an 2d Array(i.e, collection of single dimension arrays) in matrix form.
Example
A[2][3] : This means A is a collection of 2 single dimension arrays
each of size 3.
A[1,1] A[1,2] A[1,3] //This is a single dimensional
array
A[2,1] A[2,2] A[2,3] //This is another single dimensional array
//The collection is a multi-dimensional or 2d Array.

Storing and Retrieving points (related x and y) quickly in Java. List vs Array

I am attempting to recreate a board game in Java which involves me storing a set of valid places pieces can be placed (for the AI). I thought that perhaps instead of storing as a list of Points, it would be run-time faster if I had an array/list/dictionary of the X coordinates in which there was an array/list of the y coordinates, so once you found the x coordinate you would only have to check its Ys not all the remaining points'.
The trouble I have is that i must change the valid points often. I came up with some possible solutions but have difficulty picking/implementing them:
HashMap < Integer, ArrayList > with X as an integer key and the Ys as an ArrayList.
Problem: I would have to create a new ArrayList every time I add an X.
Also I am unsure about runtime performance of HashMap.
int[X][Y] array initialized to the board size with each point set to its relative location (point 2,3 sets[2][3]) unset point being an invalid integer.
Problem: I would have to iterate through all the points and check every point.
List of Points This would simply be a Linked/Array List of Points.
Problem: Lists are slower than arrays.
How would using a Linked list of Points compare to checking the whole array like above?
Perhaps I should use a 2d linked list? What would be the fastest runtime way to do this?

You're worrying about the wrong things. Accessing collection/map/array items is extremely fast. The graphical part will be way more performance-sensitive. Just use whatever data structure is most natural. It's unlikely that you're going to be storing enough items to really matter anyway. Build it first, then figure out where your performance problems really are.

if you use an ArrayList of Points you have nearly the same performance as with an array (in Java)
and I think this is the fastest solution, because as you already mentioned you have to iterate through the complete int-array and a HashMap and the relying ArrayLists have to be changed depending on changing/adding coordinates

Java: matrix data type for inserting values based on their coordinates

I've a requirement in which i need to read values and their coordinates and place them into a matrix for displaying it later.
so lets say i've the following:
<name='abc', coordinates='1,3'>
<name='xyz', coordinates='2,1'>
...............................
Now i need to put these in a 'matrix collection' based on their coordinate values and get display as table (with cells in the table occupying respective coordinates slot).
Is there a collection/way to do this in java? Mind you, i don't need a swing or any graphic library techniques. I just need a datastructure to do this.
Thank you
BC

You could use the Table class from Guava.

If you know in advance the boundaries of your grid, you can use a 2 dimensional array:
int[][] matrix = new int [n][n];
If you do not, one way to emulate this is with a List of Lists:
ArrayList <ArrayList<Integer> > matrix = new ArrayList <ArrayList <Integer> >();

Nothing's going to do this automatically for you AFAIK. You'll need to start with extracting the data. Depending on how it's offered to you, you could use regular expressions or some specialized parser (if it's XML, there's a broad selection of tools in Java).
Next up, you're going to need to split that coordinate String. Check method split of class String.
Finally, those coordinates are gonna need to become integers. Check method parseInt of class Integer.
With these now numerical coordinates, you can insert the value into an array. If you know the maximum coordinates beforehand, you can immediately create the array. If the coordinates can be any value without bounds, you'll need some dynamic structure or regularly make a larger array and copy over the old contents.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.