Playing a sine wave of varying pitch

Playing a sine wave of varying pitch - java

I have a (pretty simple) piece of code I've thrown together which plays a sine wave of a specific frequency and plays it - it works no problem:
public class Sine {
private static final int SAMPLE_RATE = 16 * 1024;
private static final int FREQ = 500;
public static void main(String[] args) throws LineUnavailableException {
final AudioFormat af = new AudioFormat(SAMPLE_RATE, 8, 1, true, true);
try(SourceDataLine line = AudioSystem.getSourceDataLine(af)) {
line.open(af, SAMPLE_RATE);
line.start();
play(line);
line.drain();
}
}
private static void play(SourceDataLine line) {
byte[] arr = getData();
line.write(arr, 0, arr.length);
}
private static byte[] getData() {
final int LENGTH = SAMPLE_RATE * 100;
final byte[] arr = new byte[LENGTH];
for(int i = 0; i < arr.length; i++) {
double angle = (2.0 * Math.PI * i) / (SAMPLE_RATE/FREQ);
arr[i] = (byte) (Math.sin(angle) * 127);
}
return arr;
}
}
I can also modify the getData() method to return a byte array that produces a gradual change in pitch as it plays, no problems there.
However, I'm struggling with a way to continuously play a sine wave which I can smoothly update the frequency and amplitude of "live" - i.e. having FREQ in the above example changed by another thread and have the sound update in real time. I've tried creating the byte array and then filling it later in a separate thread based on the required values, but either seem to get nothing or distortion. I've also tried writing to the SourceDataLine in chunks, but this provides "blocks" of discrete frequencies rather than the smooth transition I'm after. A search around doesn't seem to provide much other than what I've already tried.
It's for an emulation of a theramin, so ideally needs to be as smooth low-latency as possible.
I can do it ahead of time no problem - but live is proving tricky. Has anyone any ideas or examples they could share?

I wrote a Java theremin, and it can be played at this url:
http://www.hexara.com/VSL/JTheremin.htm
On that site, there are two links to the Java Gaming forum where there was some discussion on the various issues involved.
I use a wavetable, rather than a sin function, to generate the PCM data, but the method of changing the variable that is fed into the sin function can be set up in a similar manner.
The easiest thing to do is to have a volatile float or double in the base class that is consulted in the innermost while loop where the sound bytes are being created. Your GUI can update this variable, and the while loop can base the pitch calculation on this.
Consulting the pitch variable once per buffer load will not be satisfactory, so the next logical step is to have your while loop check this variable with every frame you process! Yes, that means referring to the pitch variable 44100 times per second, if that is your frame rate.
But even so, the problem remains that response is limited by the manner in which the JVM time slices threads. When the sound thread is not actively looping, it is also not reading the new values that have been placed into the "pitch" variable! Recall that while the sound thread is quite able to keep the frame rate constant, it is not doing so in "real time," but in bursts of activity. Thus the GUI may overwrite the pitch value several times during the period when the sound processing thread is sleeping, resulting in pitch discontinuities.
To get around this, I made a FIFO where I store and timestamp all the GUI-generated pitch changing events. In the innermost sound processing loop, this FIFO is consulted (instead of the volatile double mentioned earlier) to determine the pitch value to be used, on a per-sample basis. Since the pitch values from a GUI will be discrete values and come at varying times, you need a method of interpolating pitch values to fill the gaps. I use the time stamps and values to calculate a per-frame interpolation, and thus update a pitch variable in the innermost loop every sample.
I think there are a lot of issues, still, with the solution I wrote, and am looking forward to revisiting this!

It looks like you are only reading from the data array once, so regardless of whether the data is modified, only one pitch will be produced. I would think you would need to be playing a shorter wave inside a loop that rereads the data array each iteration. I don't know how the SourceDataLine class functions though, so I don't know if this would produce the sound unsegmented.

Related

Invert image pixels after 1 inversion

I have a piece of code, seen below, that I'm using to invert the pixel data of an image. This code works forward on the initial inversion (black becomes white, white becomes black, etc). However, when I take the inverted image file and rerun it through this code to try and get the original image, the values are nowhere near the original and the image has random dark patches and very high contrast.
So tell me, what is a better way to get the inversion of the inversion? Does the below code need tweaking? Is there another way to go about this entirely?
Code:
//bitsStored is the bit depth. In this test, it is 10.
//imageBytes is the pixel data in a byte array
public static short[] invert(int bitsStored) {
short min = min(imageBytes);//custom method. Gets the minimum value in the byte array.
short range = (short) (2 << bitsStored);
short[] holder = new short[imageBytes.length];
for (int i = 0; i < imageBytes.length; i++) {
holder[i] = (short) (range - imageBytes[i] - min);
}
imageBytes = holder;
return imageBytes;
}
Note: The image I'm using has a 16-bit depth but only uses 10-bits for storage.
Let me know if there is any way I can make my question clearer. Thank you!
EDIT: I have an idea. Could this be happening because the min value changes between the first run and the second? I feel like, in concept, the inversion of an inversion should be the original. But in the math, the only number that is the same between the two runs is the range value. So there has to be a better way to do this. I'll continue to think about it, but any insights you guys have on it would be much appreciated.

Data Structure and Algorithm for a 3D Volume?

I've been tinkering with some Minecraft Bukkit plugin development, and am currently working on something where I need to be able to define a "volume" of space and determine when an entity (player) moves from outside that volume to inside (or vice versa).
If I restrict the "volume" to boxes, it should be simple. The data structure can just maintain the X/Y/Z bounding integers (so 6 total integers) and calculating entry/exit given two points (movement from and movement to) should just be a matter of determining if A) all three To values are within all three ranges and B) at least one From value is outside its corresponding range.
(Though if there's a better, more performant way of storing and calculating this, I'm all ears.)
However, what if the "volume" isn't a simple box? Suppose I have an oddly-shaped room and want to enclose the volume of that room. I could arrange multiple "volumes" individually to fill the overall space, however that would result in false positives when an entity moves from one to another.
Not having worked in gaming or 3D engines before, I'm drawing a blank on how I might be able to structure something like this. But it occurs to me that this is likely a problem which has been solved and has known established patterns. Essentially, I'm trying to:
Define a data structure which can represent an oddly-shaped volume of space (albeit at least based on block coordinates).
Define an algorithm which, given a source and destination of movement, can determine if the movement crossed a boundary of the defined space.
Are there established patterns and practices for this?

I don't know if this has been used in any kind of video game before, but the first thing that came to mind is the classic Sieve of Eratosthenes implementation, the only change would be to make the boolean array 3D, and use the keys as coordinates. Obviously though as x and y values can be huge in Minecraft, you'd probably want to save space by saving an offset between the world 0,0 position and your selection, something like this:
class OddArea
{
static final int MAX_SELECTION_SIZE = 64; //Or whatever
public final int xOffset, yOffset;
// 256 = Chunk height
public final boolean[][][] squares = new boolean[MAX_SELECTION_SIZE][MAX_SELECTION_SIZE][256];
OddArea()
{
this(0, 0);
}
OddArea(final int xOffset, final int yOffset)
{
this.xOffset = xOffset;
this.yOffset = yOffset;
}
void addBlock(final int x, final int y, final int z)
{
this.squares[x - this.xOffset][y - this.yOffset][z] = true;
}
boolean isInsideArea(final int x, final int y, final int z)
{
return this.squares[x - this.xOffset][y - this.yOffset][z];
}
}
z doesn't require an offset as the Minecraft world is only 256 blocks high.
The only issue I can think of with this setup is you'd have to know the lowest x,y coordinates before you start filling up your object

In general you should be using a data structure similar to kd trees. You can represent your volume as a union of either cubes or spheres, and it should be easy to evaluate if an object enters the volume.
BTW, to calculate if two spheres intersect, check if the distance between centers is less than sum of radii.

Neural Network Back-Propagation Algorithm Gets Stuck on XOR Training PAttern

Overview
So I'm trying to get a grasp on the mechanics of neural networks. I still don't totally grasp the math behind it, but I think I understand how to implement it. I currently have a neural net that can learn AND, OR, and NOR training patterns. However, I can't seem to get it to implement the XOR pattern. My feed forward neural network consists of 2 inputs, 3 hidden, and 1 output. The weights and biases are randomly set between -0.5 and 0.5, and outputs are generated with the sigmoidal activation function
Algorithm
So far, I'm guessing I made a mistake in my training algorithm which is described below:
For each neuron in the output layer, provide an error value that is the desiredOutput - actualOutput --go to step 3
For each neuron in a hidden or input layer (working backwards) provide an error value that is the sum of all forward connection weights * the errorGradient of the neuron at the other end of the connection --go to step 3
For each neuron, using the error value provided, generate an error gradient that equals output * (1-output) * error. --go to step 4
For each neuron, adjust the bias to equal current bias + LEARNING_RATE * errorGradient. Then adjust each backward connection's weight to equal current weight + LEARNING_RATE * output of neuron at other end of connection * this neuron's errorGradient
I'm training my neural net online, so this runs after each training sample.
Code
This is the main code that runs the neural network:
private void simulate(double maximumError) {
int errorRepeatCount = 0;
double prevError = 0;
double error; // summed squares of errors
int trialCount = 0;
do {
error = 0;
// loop through each training set
for(int index = 0; index < Parameters.INPUT_TRAINING_SET.length; index++) {
double[] currentInput = Parameters.INPUT_TRAINING_SET[index];
double[] expectedOutput = Parameters.OUTPUT_TRAINING_SET[index];
double[] output = getOutput(currentInput);
train(expectedOutput);
// Subtracts the expected and actual outputs, gets the average of those outputs, and then squares it.
error += Math.pow(getAverage(subtractArray(output, expectedOutput)), 2);
}
} while(error > maximumError);
Now the train() function:
public void train(double[] expected) {
layers.outputLayer().calculateErrors(expected);
for(int i = Parameters.NUM_HIDDEN_LAYERS; i >= 0; i--) {
layers.allLayers[i].calculateErrors();
}
}
Output layer calculateErrors() function:
public void calculateErrors(double[] expectedOutput) {
for(int i = 0; i < numNeurons; i++) {
Neuron neuron = neurons[i];
double error = expectedOutput[i] - neuron.getOutput();
neuron.train(error);
}
}
Normal (Hidden & Input) layer calculateErrors() function:
public void calculateErrors() {
for(int i = 0; i < neurons.length; i++) {
Neuron neuron = neurons[i];
double error = 0;
for(Connection connection : neuron.forwardConnections) {
error += connection.output.errorGradient * connection.weight;
}
neuron.train(error);
}
}
Full Neuron class:
package neuralNet.layers.neurons;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import neuralNet.Parameters;
import neuralNet.layers.NeuronLayer;
public class Neuron {
private double output, bias;
public List<Connection> forwardConnections = new ArrayList<Connection>(); // Forward = layer closer to input -> layer closer to output
public List<Connection> backwardConnections = new ArrayList<Connection>(); // Backward = layer closer to output -> layer closer to input
public double errorGradient;
public Neuron() {
Random random = new Random();
bias = random.nextDouble() - 0.5;
}
public void addConnections(NeuronLayer prevLayer) {
// This is true for input layers. They create their connections differently. (See InputLayer class)
if(prevLayer == null) return;
for(Neuron neuron : prevLayer.neurons) {
Connection.createConnection(neuron, this);
}
}
public void calcOutput() {
output = bias;
for(Connection connection : backwardConnections) {
connection.input.calcOutput();
output += connection.input.getOutput() * connection.weight;
}
output = sigmoid(output);
}
private double sigmoid(double output) {
return 1 / (1 + Math.exp(-1*output));
}
public double getOutput() {
return output;
}
public void train(double error) {
this.errorGradient = output * (1-output) * error;
bias += Parameters.LEARNING_RATE * errorGradient;
for(Connection connection : backwardConnections) {
// for clarification: connection.input refers to a neuron that outputs to this neuron
connection.weight += Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;
}
}
}
Results
When I'm training for AND, OR, or NOR the network can usually converge within about 1000 epochs, however when I train with XOR, the outputs become fixed and it never converges. So, what am I doing wrong? Any ideas?
Edit
Following the advice of others, I started over and implemented my neural network without classes...and it works. I'm still not sure where my problem lies in the above code, but it's in there somewhere.

This is surprising because you are using a big enough network (barely) to learn XOR. Your algorithm looks right, so I dont really know what is going on. It might help to know how you generate your training data: are you just reating the samples (1,0,1),(1,1,0),(0,1,1),(0,0,0) or something like that over and over? Perhaps the problem is that stochastic gradient descent is causing you to jump around the stabilizing minima. You could try some things to fix this: perhaps randomly sample from your training examples instead of repeating them (if that is what you are doing). Or, alternatively, you could modify your learning algorithm:
currently you have something equivalent to:
weight(epoch) = weight(epoch - 1) + deltaWeight(epoch)
deltaWeight(epoch) = mu * errorGradient(epoch)
where mu is the learning rate
One option is to very slowly decrease the value of mu.
An alternative would be to change your definition of deltaWeight to include a "momentum"
deltaWeight(epoch) = mu * errorGradient(epoch) + alpha * deltaWeight(epoch -1)
where alpha is the momentum parameter (between 0 and 1).
Visually, you can think of gradient descent as trying to find the minimum point of a curved surface by placing an object on that surface, and then step by step moving that object small amounts in which ever directing is sloping down based on where it is currently located. The problem is that you dont really do gradient descent: instead you do stochastic gradient descent where you move in direction by sampling from a set of training vectors and moving in what ever direction the sample makes look like is down. On average over the entire training data, stochastic gradient descent should work, but it is isn't guaranteed to because you can get into a situation where you jump back and forth never making progress. Slowly decreasing the learning rate means you take smaller and smaller steps each time so can not get stuck in an infinite cycle.
On the other hand, momentum makes the algorithm into something akin to rolling a rubber ball. As the ball roles it tends to go in the downwards direction, but it also tends to keep going in the direction it was going before, and if it is ever on a stretch where the down slope is in the same direction for a while it will speed up. The ball will therefore jump over some local minima, and it will be more resilient against stepping back and forth over the target because doing so means working against the force of momentum.
Having some code and thinking about this some more, it is pretty clear that your problem is in training the early layers. The functions you have successfully learned are all linearly separable, so it would make sense that only a single layer is being properly learned. I agree with LiKao about implementation strategies in general, although your approach should work. My suggestion for how to debug this is figure out what the progression of the weights on the connections between the input layer and the output layer looks like.
You should post the rest implementation of Neuron.

I faced the same problem short time ago. Finally I found the solution, how to write a code solving XOR wit the MLP algorithm.
The XOR problem seems to be an easy to learn problem but it isn't for the MLP because it's not linearly separable. So even if your MLP is OK (I mean there is no bug in your code) you have to find the good parameters to be able to learn the XOR problem.
Two hidden and one output neuron is fine. The 2 main thing you have to set:
although you have only 4 training samples you have to run the training for a couple of thousands epoch.
if you use sigmoid hidden layers but linear output the network will converge faster
Here is the detailed description and sample code: http://freeconnection.blogspot.hu/2012/09/solving-xor-with-mlp.html

Small hint - if the output of your NN seem to drift toward 0.5 then everything's OK!
The algorithm using just the learning rate and bias is just too simple to quickly learn XOR. You can either increase the number of epochs or change the algorithm.
My recommendation is to use momentum:
1000 epochs
learningRate = 0.3
momentum = 0.8
weights drawn from [0,1]
bias drawn form [-0.5, 0.5]
And some crucial pseudo code (assuming back and forward propagation works) :
for every edge:
previous_edge_weight_change = -1 * learningRate * edge_source_neuron_value * edge_target_neuron_delta + previous_edge_weight * momentum
edge_weight += previous_edge_weight_change
for every neuron:
previous_neuron_bias_change = -1 * learningRate * neuron_delta + previous_neuron_bias_change * momentum
bias += previous_neuron_bias_change

I would suggest you to generate a grid (say from [-5,-5] to [5,5] with a step like 0.5), learn your MLP on the XOR and apply it to the grid. Plotted in color you could see some kind of a frontier.
If you do that at each iteration, you'll see the evolution of the frontier and can control the learning.

It's been a while since I last implemented an Neural Network myself, but I think your mistake is in the lines:
bias += Parameters.LEARNING_RATE * errorGradient;
and
connection.weight += Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;
The first of these lines should not be there at all. Bias is best modeled as the input of a neuron which is fixed at 1. This will serve to make your code a lot simpler and cleaner, because you will not have to treat the bias in any special way.
The other point is, that I think the sign is wrong in both of these expressions. Think about it like this:
Your gradient points into the direction of steepest ascend, so if you go into that direction, your error will get larger.
What you are doing here is adding something to the weights, in case the error is already positive, i.e. you are making it more positive. If it is negative you are substracting someting, i.e. you make it more negative.
Unless I am missing something about your definition of error or the gradient calculation you should change these lines to:
bias -= Parameters.LEARNING_RATE * errorGradient;
and
connection.weight -= Parameters.LEARNING_RATE * connection.input.getOutput() * errorGradient;
I had a similar mistake in one of my early implementations and it lead to exactly the same behaviour, i.e. it resulted in a network that learned in simple cases, but not anymore once the training data became more complex.

LiKao's comment to simplify my implementation and get rid of the object-oriented aspects solved my problem. The flaw in the algorithm as it is described above is unknown, however I now have a working neural network that is a lot smaller.
Feel free to continue to provide insight on the problem with my previous implementation, as others may have the same problem in the future.

I'm a bit rusty on neural networks, but I think there was a problem to implement the XOR with one perceptron: basically a neuron is able to separate two groups of solutions through a straight line, but one straight line is not sufficient for the XOR problem...
Here should be the answer!

I couldn't see anything wrong with the code, but I was having a similar problem with my network not converging for XOR, so figured I'd post my working configuration.
3 input neurons (one of them being a fixed bias of 1.0)
3 hidden neurons
1 output neuron
Weights randomly chosen between -0.5 and 0.5.
Sigmoid activation function.
Learning rate = 0.2
Momentum = 0.4
Epochs = 50,000
Converged 10/10 times.
One of the mistakes I was making was not connecting the bias input to the output neuron, and this would mean for the same configuration it only converged 2 out of 10 times with the other eight times failing because 1 and 1 would output 0.5.
Another mistake was not doing enough epochs. If I only did 1000 then the outputs tend to be around 0.5 for every test case. With epochs >= 8000 so 2000 times for each test case, it started to look like it might be working (but only if using momentum).
When doing 50000 epochs it did not matter whether momentum was used or not.
Another thing I tried was to not apply the sigmoid function to the output neurons output (which I think was what an earlier post had suggested), but this wrecked the network because the output*(1-output) part of the error equation could now be negative meaning weights were updated in a way that caused the error to increase.

Java: micro-optimizing array manipulation

I am trying to make a Java port of a simple feed-forward neural network.
This obviously involves lots of numeric calculations, so I am trying to optimize my central loop as much as possible. The results should be correct within the limits of the float data type.
My current code looks as follows (error handling & initialization removed):
/**
* Simple implementation of a feedforward neural network. The network supports
* including a bias neuron with a constant output of 1.0 and weighted synapses
* to hidden and output layers.
*
* #author Martin Wiboe
*/
public class FeedForwardNetwork {
private final int outputNeurons; // No of neurons in output layer
private final int inputNeurons; // No of neurons in input layer
private int largestLayerNeurons; // No of neurons in largest layer
private final int numberLayers; // No of layers
private final int[] neuronCounts; // Neuron count in each layer, 0 is input
// layer.
private final float[][][] fWeights; // Weights between neurons.
// fWeight[fromLayer][fromNeuron][toNeuron]
// is the weight from fromNeuron in
// fromLayer to toNeuron in layer
// fromLayer+1.
private float[][] neuronOutput; // Temporary storage of output from previous layer
public float[] compute(float[] input) {
// Copy input values to input layer output
for (int i = 0; i < inputNeurons; i++) {
neuronOutput[0][i] = input[i];
}
// Loop through layers
for (int layer = 1; layer < numberLayers; layer++) {
// Loop over neurons in the layer and determine weighted input sum
for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) {
// Bias neuron is the last neuron in the previous layer
int biasNeuron = neuronCounts[layer - 1];
// Get weighted input from bias neuron - output is always 1.0
float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];
// Get weighted inputs from rest of neurons in previous layer
for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
activation += neuronOutput[layer-1][inputNeuron] * fWeights[layer - 1][inputNeuron][neuron];
}
// Store neuron output for next round of computation
neuronOutput[layer][neuron] = sigmoid(activation);
}
}
// Return output from network = output from last layer
float[] result = new float[outputNeurons];
for (int i = 0; i < outputNeurons; i++)
result[i] = neuronOutput[numberLayers - 1][i];
return result;
}
private final static float sigmoid(final float input) {
return (float) (1.0F / (1.0F + Math.exp(-1.0F * input)));
}
}
I am running the JVM with the -server option, and as of now my code is between 25% and 50% slower than similar C code. What can I do to improve this situation?
Thank you,
Martin Wiboe
Edit #1: After seeing the vast amount of responses, I should probably clarify the numbers in our scenario. During a typical run, the method will be called about 50.000 times with different inputs. A typical network would have numberLayers = 3 layers with 190, 2 and 1 neuron, respectively. The innermost loop will therefore have about 2*191+3=385 iterations (when counting the added bias neuron in layers 0 and 1)
Edit #1: After implementing the various suggestions in this thread, our implementation is practically as fast as the C version (within ~2 %). Thanks for all the help! All of the suggestions have been helpful, but since I can only mark one answer as the correct one, I will give it to #Durandal for both suggesting array optimizations and being the only one to precalculate the for loop header.

Some tips.
in your inner most loop, think about how you are traversing your CPU cache and re-arrange your matrix so you are accessing the outer most array sequentially. This will result in you accessing your cache in order rather than jumping all over the place. A cache hit can be two orders of magniture faster than a cache miss.
e.g restructure fWeights so it is accessed as
activation += neuronOutput[layer-1][inputNeuron] * fWeights[layer - 1][neuron][inputNeuron];
don't perform work inside the loop (every time) which can be done outside the loop (once). Don't perform the [layer -1] lookup every time when you can place this in a local variable. Your IDE should be able to refactor this easily.
multi-dimensional arrays in Java are not as efficient as they are in C. They are actually multiple layers of single dimensional arrays. You can restructure the code so you're only using a single dimensional array.
don't return a new array when you can pass the result array as an argument. (Saves creating a new object on each call).
rather than peforming layer-1 all over the place, why not use layer1 as layer-1 and using layer1+1 instead of layer.

Disregarding the actual math, the array indexing in Java can be a performance hog in itself. Consider that Java has no real multidimensional arrays, but rather implements them as array of arrays. In your innermost loop, you access over multiple indices, some of which are in fact constant in that loop. Part of the array access can be move outside of the loop:
final int[] neuronOutputSlice = neuronOutput[layer - 1];
final int[][] fWeightSlice = fWeights[layer - 1];
for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
activation += neuronOutputSlice[inputNeuron] * fWeightsSlice[inputNeuron][neuron];
}
It is possible that the server JIT performs a similar code invariant movement, the only way to find out is change and profile it. On the client JIT this should improve performance no matter what.
Another thing you can try is to precalculate the for-loop exit conditions, like this:
for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) { ... }
// transform to precalculated exit condition (move invariant array access outside loop)
for (int neuron = 0, neuronCount = neuronCounts[layer]; neuron < neuronCount; neuron++) { ... }
Again the JIT may already do this for you, so profile if it helps.
Is there a point to multiplying with 1.0F that eludes me here?:
float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];
Other things that could potentially improve speed at cost of readability: inline sigmoid() function manually (the JIT has a very tight limit for inlining and the function might be larger).
It can be slightly faster to run a loop backwards (where it doesnt change the outcome of course), since testing the loop index against zero is a little cheaper than checking against a local variable (the innermost loop is a potentical candidate again, but dont expect the output to be 100% identical in all cases, since adding floats a + b + c is potentially not the same as a + c + b).

For a start, don't do this:
// Copy input values to input layer output
for (int i = 0; i < inputNeurons; i++) {
neuronOutput[0][i] = input[i];
}
But this:
System.arraycopy( input, 0, neuronOutput[0], 0, inputNeurons );

First thing I would look into is seeing if Math.exp is slowing you down. See this post on a Math.exp approximation for a native alternative.

Replace the expensive floating point sigmoid transfer function with an integer step transfer function.
The sigmoid transfer function is a model of organic analog synaptic learning, which in turn seems to be a model of a step function.
The historical precedent for this is that Hinton designed the back-prop algorithm directly from the first principles of cognitive science theories about real synapses, which in turn were based on real analog measurements, which turn out to be sigmoid.
But the sigmoid transfer function seems to be an organic model of the digital step function, which of course cannot be directly implemented organically.
Rather than model a model, replace the expensive floating point implementation of the organic sigmoid transfer function with the direct digital implementation of a step function (less than zero = -1, greater than zero = +1).
The brain cannot do this, but backprop can!
This not only linearly and drastically improves performance of a single learning iteration, it also reduces the number of learning iterations required to train the network: supporting evidence that learning is inherently digital.
Also supports the argument that Computer Science is inherently cool.

Purely based upon code inspection, your inner most loop has to compute references to a three-dimensional parameter and its being done a lot. Depending upon your array dimensions could you possibly be having cache issues due to have to jump around memory with each loop iteration. Maybe you could rearrange the dimensions so the inner loop tries to access memory elements that are closer to one another than they are now?
In any case, profile your code before making any changes and see where the real bottleneck is.

I suggest using a fixed point system rather than a floating point system. On almost all processors using int is faster than float. The simplest way to do this is simply shift everything left by a certain amount (4 or 5 are good starting points) and treat the bottom 4 bits as the decimal.
Your innermost loop is doing floating point maths so this may give you quite a boost.

The key to optimization is to first measure where the time is spent. Surround various parts of your algorithm with calls to System.nanoTime():
long start_time = System.nanoTime();
doStuff();
long time_taken = System.nanoTime() - start_time;
I'd guess that while using System.arraycopy() would help a bit, you'll find your real costs in the inner loop.
Depending on what you find, you might consider replacing the float arithmetic with integer arithmetic.

Any code tips for speeding up random reads from a Java FileChannel?

I have a large (3Gb) binary file of doubles which I access (more or less) randomly during an iterative algorithm I have written for clustering data. Each iteration does about half a million reads from the file and about 100k writes of new values.
I create the FileChannel like this...
f = new File(_filename);
_ioFile = new RandomAccessFile(f, "rw");
_ioFile.setLength(_extent * BLOCK_SIZE);
_ioChannel = _ioFile.getChannel();
I then use a private ByteBuffer the size of a double to read from it
private ByteBuffer _double_bb = ByteBuffer.allocate(8);
and my reading code looks like this
public double GetValue(long lRow, long lCol)
{
long idx = TriangularMatrix.CalcIndex(lRow, lCol);
long position = idx * BLOCK_SIZE;
double d = 0;
try
{
_double_bb.position(0);
_ioChannel.read(_double_bb, position);
d = _double_bb.getDouble(0);
}
...snip...
return d;
}
and I write to it like this...
public void SetValue(long lRow, long lCol, double d)
{
long idx = TriangularMatrix.CalcIndex(lRow, lCol);
long offset = idx * BLOCK_SIZE;
try
{
_double_bb.putDouble(0, d);
_double_bb.position(0);
_ioChannel.write(_double_bb, offset);
}
...snip...
}
The time taken for an iteration of my code increases roughly linearly with the number of reads. I have added a number of optimisations to the surrounding code to minimise the number of reads, but I am at the core set that I feel are necessary without fundamentally altering how the algorithm works, which I want to avoid at the moment.
So my question is whether there is anything in the read/write code or JVM configuration I can do to speed up the reads? I realise I can change hardware, but before I do that I want to make sure that I have squeezed every last drop of software juice out of the problem.
Thanks in advance

As long as your file is stored on a regular harddisk, you will get the biggest possible speedup by organizing your data in a way that gives your accesses locality, i.e. causes as many get/set calls in a row as possible to access the same small area of the file.
This is more important than anything else you can do because accessing random spots on a HD is by far the slowest thing a modern PC does - it takes about 10,000 times longer than anything else.
So if it's possible to work on only a part of the dataset (small enough to fit comfortably into the in-memory HD cache) at a time and then combine the results, do that.
Alternatively, avoid the issue by storing your file on an SSD or (better) in RAM. Even storing it on a simple thumb drive could be a big improvement.

Instead of reading into a ByteBuffer, I would use file mapping, see: FileChannel.map().
Also, you don't really explain how your GetValue(row, col) and SetValue(row, col) access the storage. Are row and col more or less random? The idea I have in mind is the following: sometimes, for image processing, when you have to access pixels like row + 1, row - 1, col - 1, col + 1 to average values; on trick is to organize the data in 8 x 8 or 16 x 16 blocks. Doing so helps keeping the different pixels of interest in a contiguous memory area (and hopefully in the cache).
You might transpose this idea to your algorithm (if it applies): you map a portion of your file once, so that the different calls to GetValue(row, col) and SetValue(row, col) work on this portion that's just been mapped.

Presumably if we can reduce the number of reads then things will go more quickly.
3Gb isn't huge for a 64 bit JVM, hence quite a lot of the file would fit in memory.
Suppose that you treat the file as "pages" which you cache. When you read a value, read the page around it and keep it in memory. Then when you do more reads check the cache first.
Or, if you have the capacity, read the whole thing into memory, in at the start of processing.

Access byte-by-byte always produce poor performance (not only in Java). Try to read/write bigger blocks (e.g. rows or columns).
How about switching to database engine for handling such amounts of data? It would handle all optimizations for you.
May be This article helps you ...

You might want to consider using a library which is designed for managing large amounts of data and random reads rather than using raw file access routines.
The HDF file format may by a good fit. It has a Java API but is not pure Java. It's licensed under an Apache Style license.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.