I am trying to write my first neural network to play the game connect four.
Im using Java and deeplearning4j.
I tried to implement a genetic algorithm, but when i train the network for a while, the outputs of the network jump to NaN and I am unable to tell where I messed up so badly for this to happen..
I will post all 3 classes below, where Game is the game logic and rules, VGFrame the UI and Main all the nn stuff.
I have a pool of 35 neural networks and each iteration i let the best 5 live and breed and randomize the newly created ones a little.
To evaluate the networks I let them battle each other and give points to the winner and points for loosing later.
Since I penalize putting a stone into a column thats already full I expected the neural networks at least to be able to play the game by the rules after a while but they cant do this.
I googled the NaN problem and it seems to be an expoding gradient problem, but from my understanding this shouldn't occur in a genetic algorithm?
Any ideas where I could look for the error or whats generally wrong with my implementation?
Main
import java.io.File;
import java.io.IOException;
import java.util.Arrays;
import java.util.Random;
import org.deeplearning4j.nn.api.OptimizationAlgorithm;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.layers.DenseLayer;
import org.deeplearning4j.nn.conf.layers.OutputLayer;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.lossfunctions.LossFunctions.LossFunction;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.learning.config.Nesterovs;
public class Main {
final int numRows = 7;
final int numColums = 6;
final int randSeed = 123;
MultiLayerNetwork[] models;
static Random random = new Random();
private static final Logger log = LoggerFactory.getLogger(Main.class);
final float learningRate = .8f;
int batchSize = 64; // Test batch size
int nEpochs = 1; // Number of training epochs
// --
public static Main current;
Game mainGame = new Game();
public static void main(String[] args) {
current = new Main();
current.frame = new VGFrame();
current.loadWeights();
}
private VGFrame frame;
private final double mutationChance = .05;
public Main() {
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().weightInit(WeightInit.XAVIER)
.activation(Activation.RELU).seed(randSeed)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).updater(new Nesterovs(0.1, 0.9))
.list()
.layer(new DenseLayer.Builder().nIn(42).nOut(30).activation(Activation.RELU)
.weightInit(WeightInit.XAVIER).build())
.layer(new DenseLayer.Builder().nIn(30).nOut(15).activation(Activation.RELU)
.weightInit(WeightInit.XAVIER).build())
.layer(new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD).nIn(15).nOut(7)
.activation(Activation.SOFTMAX).weightInit(WeightInit.XAVIER).build())
.build();
models = new MultiLayerNetwork[35];
for (int i = 0; i < models.length; i++) {
models[i] = new MultiLayerNetwork(conf);
models[i].init();
}
}
public void addChip(int i, boolean b) {
if (mainGame.gameState == 0)
mainGame.addChip(i, b);
if (mainGame.gameState == 0) {
float[] f = Main.rowsToInput(mainGame.rows);
INDArray input = Nd4j.create(f);
INDArray output = models[0].output(input);
for (int i1 = 0; i1 < 7; i1++) {
System.out.println(i1 + ": " + output.getDouble(i1));
}
System.out.println("----------------");
mainGame.addChip(Main.getHighestOutput(output), false);
}
getFrame().paint(getFrame().getGraphics());
}
public void newGame() {
mainGame = new Game();
getFrame().paint(getFrame().getGraphics());
}
public void startTraining(int iterations) {
// --------------------------
for (int gameNumber = 0; gameNumber < iterations; gameNumber++) {
System.out.println("Iteration " + gameNumber + " of " + iterations);
float[] evaluation = new float[models.length];
for (int i = 0; i < models.length; i++) {
for (int j = 0; j < models.length; j++) {
if (i != j) {
Game g = new Game();
g.playFullGame(models[i], models[j]);
if (g.gameState == 1) {
evaluation[i] += 45;
evaluation[j] += g.turnNumber;
}
if (g.gameState == 2) {
evaluation[j] += 45;
evaluation[i] += g.turnNumber;
}
}
}
}
float[] evaluationSorted = evaluation.clone();
Arrays.sort(evaluationSorted);
// keep the best 4
int n1 = 0, n2 = 0, n3 = 0, n4 = 0, n5 = 0;
for (int i = 0; i < evaluation.length; i++) {
if (evaluation[i] == evaluationSorted[evaluationSorted.length - 1])
n1 = i;
if (evaluation[i] == evaluationSorted[evaluationSorted.length - 2])
n2 = i;
if (evaluation[i] == evaluationSorted[evaluationSorted.length - 3])
n3 = i;
if (evaluation[i] == evaluationSorted[evaluationSorted.length - 4])
n4 = i;
if (evaluation[i] == evaluationSorted[evaluationSorted.length - 5])
n5 = i;
}
models[0] = models[n1];
models[1] = models[n2];
models[2] = models[n3];
models[3] = models[n4];
models[4] = models[n5];
for (int i = 3; i < evaluationSorted.length; i++) {
// random parent/keep w8ts
double r = Math.random();
if (r > .3) {
models[i] = models[random.nextInt(3)].clone();
} else if (r > .1) {
models[i].setParams(breed(models[random.nextInt(3)], models[random.nextInt(3)]));
}
// Mutate
INDArray params = models[i].params();
models[i].setParams(mutate(params));
}
}
}
private INDArray mutate(INDArray params) {
double[] d = params.toDoubleVector();
for (int i = 0; i < d.length; i++) {
if (Math.random() < mutationChance)
d[i] += (Math.random() - .5) * learningRate;
}
return Nd4j.create(d);
}
private INDArray breed(MultiLayerNetwork m1, MultiLayerNetwork m2) {
double[] d = m1.params().toDoubleVector();
double[] d2 = m2.params().toDoubleVector();
for (int i = 0; i < d.length; i++) {
if (Math.random() < .5)
d[i] += d2[i];
}
return Nd4j.create(d);
}
static int getHighestOutput(INDArray output) {
int x = 0;
for (int i = 0; i < 7; i++) {
if (output.getDouble(i) > output.getDouble(x))
x = i;
}
return x;
}
static float[] rowsToInput(byte[][] rows) {
float[] f = new float[7 * 6];
for (int i = 0; i < 6; i++) {
for (int j = 0; j < 7; j++) {
// f[j + i * 7] = rows[j][i] / 2f;
f[j + i * 7] = (rows[j][i] == 0 ? .5f : rows[j][i] == 1 ? 0f : 1f);
}
}
return f;
}
public void saveWeights() {
log.info("Saving model");
for (int i = 0; i < models.length; i++) {
File resourcesDirectory = new File("src/resources/model" + i);
try {
models[i].save(resourcesDirectory, true);
} catch (IOException e) {
e.printStackTrace();
}
}
}
public void loadWeights() {
if (new File("src/resources/model0").exists()) {
for (int i = 0; i < models.length; i++) {
File resourcesDirectory = new File("src/resources/model" + i);
try {
models[i] = MultiLayerNetwork.load(resourcesDirectory, true);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
System.out.println("col: " + models[0].params().shapeInfoToString());
}
public VGFrame getFrame() {
return frame;
}
}
VGFrame
import java.awt.Color;
import java.awt.Graphics;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import javax.swing.BorderFactory;
import javax.swing.JButton;
import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.JTextField;
public class VGFrame extends JFrame {
JTextField iterations;
/**
*
*/
private static final long serialVersionUID = 1L;
public VGFrame() {
super("Vier Gewinnt");
this.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
this.setSize(1300, 800);
this.setVisible(true);
JPanel panelGame = new JPanel();
panelGame.setBorder(BorderFactory.createLineBorder(Color.black, 2));
this.add(panelGame);
var handler = new Handler();
var menuHandler = new MenuHandler();
JButton b1 = new JButton("1");
JButton b2 = new JButton("2");
JButton b3 = new JButton("3");
JButton b4 = new JButton("4");
JButton b5 = new JButton("5");
JButton b6 = new JButton("6");
JButton b7 = new JButton("7");
b1.addActionListener(handler);
b2.addActionListener(handler);
b3.addActionListener(handler);
b4.addActionListener(handler);
b5.addActionListener(handler);
b6.addActionListener(handler);
b7.addActionListener(handler);
panelGame.add(b1);
panelGame.add(b2);
panelGame.add(b3);
panelGame.add(b4);
panelGame.add(b5);
panelGame.add(b6);
panelGame.add(b7);
JButton buttonTrain = new JButton("Train");
JButton buttonNewGame = new JButton("New Game");
JButton buttonSave = new JButton("Save Weights");
JButton buttonLoad = new JButton("Load Weights");
iterations = new JTextField("1000");
buttonTrain.addActionListener(menuHandler);
buttonNewGame.addActionListener(menuHandler);
buttonSave.addActionListener(menuHandler);
buttonLoad.addActionListener(menuHandler);
iterations.addActionListener(menuHandler);
panelGame.add(iterations);
panelGame.add(buttonTrain);
panelGame.add(buttonNewGame);
panelGame.add(buttonSave);
panelGame.add(buttonLoad);
this.validate();
}
#Override
public void paint(Graphics g) {
super.paint(g);
if (Main.current.mainGame.rows == null)
return;
var rows = Main.current.mainGame.rows;
for (int i = 0; i < rows.length; i++) {
for (int j = 0; j < rows[0].length; j++) {
if (rows[i][j] == 0)
break;
g.setColor((rows[i][j] == 1 ? Color.yellow : Color.red));
g.fillOval(80 + 110 * i, 650 - 110 * j, 100, 100);
}
}
}
public void update() {
}
}
class Handler implements ActionListener {
#Override
public void actionPerformed(ActionEvent event) {
if (Main.current.mainGame.playersTurn)
Main.current.addChip(Integer.parseInt(event.getActionCommand()) - 1, true);
}
}
class MenuHandler implements ActionListener {
#Override
public void actionPerformed(ActionEvent event) {
switch (event.getActionCommand()) {
case "New Game":
Main.current.newGame();
break;
case "Train":
Main.current.startTraining(Integer.parseInt(Main.current.getFrame().iterations.getText()));
break;
case "Save Weights":
Main.current.saveWeights();
break;
case "Load Weights":
Main.current.loadWeights();
break;
}
}
}
Game
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
public class Game {
int turnNumber = 0;
byte[][] rows = new byte[7][6];
boolean playersTurn = true;
int gameState = 0; // 0:running, 1:Player1, 2:Player2, 3:Draw
public boolean isRunning() {
return this.gameState == 0;
}
public void addChip(int x, boolean player1) {
turnNumber++;
byte b = nextRow(x);
if (b == 6) {
gameState = player1 ? 2 : 1;
return;
}
rows[x][b] = (byte) (player1 ? 1 : 2);
gameState = checkWinner(x, b);
}
private byte nextRow(int x) {
for (byte i = 0; i < rows[x].length; i++) {
if (rows[x][i] == 0)
return i;
}
return 6;
}
// 0 continue, 1 Player won, 2 ai won, 3 Draw
private int checkWinner(int x, int y) {
int color = rows[x][y];
// Vertikal
if (getCount(x, y, 1, 0) + getCount(x, y, -1, 0) >= 3)
return rows[x][y];
// Horizontal
if (getCount(x, y, 0, 1) + getCount(x, y, 0, -1) >= 3)
return rows[x][y];
// Diagonal1
if (getCount(x, y, 1, 1) + getCount(x, y, -1, -1) >= 3)
return rows[x][y];
// Diagonal2
if (getCount(x, y, -1, 1) + getCount(x, y, 1, -1) >= 3)
return rows[x][y];
for (byte[] bs : rows) {
for (byte s : bs) {
if (s == 0)
return 0;
}
}
return 3; // Draw
}
private int getCount(int x, int y, int dirX, int dirY) {
int color = rows[x][y];
int count = 0;
while (true) {
x += dirX;
y += dirY;
if (x < 0 | x > 6 | y < 0 | y > 5)
break;
if (color != rows[x][y])
break;
count++;
}
return count;
}
public void playFullGame(MultiLayerNetwork m1, MultiLayerNetwork m2) {
boolean player1 = true;
while (this.gameState == 0) {
float[] f = Main.rowsToInput(this.rows);
INDArray input = Nd4j.create(f);
this.addChip(Main.getHighestOutput(player1 ? m1.output(input) : m2.output(input)), player1);
player1 = !player1;
}
}
}
With a quick look, and based on the analysis of your multiplier variants, it seems like the NaN is produced by an arithmetic underflow, caused by your gradients being too small (too close to absolute 0).
This is the most suspicious part of the code:
f[j + i * 7] = (rows[j][i] == 0 ? .5f : rows[j][i] == 1 ? 0f : 1f);
If rows[j][i] == 1 then 0f is stored. I don't know how this is managed by the neural network (or even java), but mathematically speaking, a finite-sized float cannot include zero.
Even if your code would alter the 0f with some extra salt, those array values' resultants would have some risk of becoming too close to zero. Due to limited precision when representing real numbers, values very close to zero can not be represented, hence the NaN.
These values have a very friendly name: subnormal numbers.
Any non-zero number with magnitude smaller than the smallest normal
number is subnormal.
IEEE_754
As with IEEE 754-1985, The standard recommends 0 for signaling NaNs, 1 for quiet NaNs, so that a signaling NaNs can be quieted by changing only this bit to 1, while the reverse could yield the encoding of an infinity.
Above's text is important here: according to the standard, you are actually specifying a NaN with any 0f value stored.
Even if the name is misleading, Float.MIN_VALUE is a positive value,higher than 0:
The real minimum float value is, in fact: -Float.MAX_VALUE.
Is floating point math subnormal?
Normalizing the gradients
If you check the issue is only because of the 0f values, you could just alter them for other values that represent something similar; Float.MIN_VALUE, Float.MIN_NORMAL, and so on. Something like this, also in other possible parts of the code where this scenario could happen. Take these just as examples, and play with these ranges:
rows[j][i] == 1 ? Float.MIN_VALUE : 1f;
rows[j][i] == 1 ? Float.MIN_NORMAL : Float.MAX_VALUE/2;
rows[j][i] == 1 ? -Float.MAX_VALUE/2 : Float.MAX_VALUE/2;
Even so, this could also lead to a NaN, based on how these values are altered.
If so, you should normalize the values. You could try applying a GradientNormalizer for this. In your network initialization, something like this should be defined, for each layer(or for those who are problematic):
new NeuralNetConfiguration
.Builder()
.weightInit(WeightInit.XAVIER)
(...)
.layer(new DenseLayer.Builder().nIn(42).nOut(30).activation(Activation.RELU)
.weightInit(WeightInit.XAVIER)
.gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) //this
.build())
(...)
There are different normalizers, so choose which one fits your schema best, and which layers should include one. The options are:
GradientNormalization
RenormalizeL2PerLayer
Rescale gradients by dividing by the L2 norm
of all gradients for the layer.
RenormalizeL2PerParamType
Rescale gradients by dividing by the L2
norm of the gradients, separately for each type of parameter within
the layer. This differs from RenormalizeL2PerLayer in that here, each
parameter type (weight, bias etc) is normalized separately. For
example, in a MLP/FeedForward network (where G is the gradient
vector), the output is as follows:
GOut_weight = G_weight / l2(G_weight) GOut_bias = G_bias / l2(G_bias)
ClipElementWiseAbsoluteValue
Clip the gradients on a per-element
basis. For each gradient g, set g <- sign(g) max(maxAllowedValue,|g|).
i.e., if a parameter gradient has absolute value greater than the
threshold, truncate it. For example, if threshold = 5, then values in
range -5<g<5 are unmodified; values <-5 are set to -5; values >5 are
set to 5.
ClipL2PerLayer
Conditional renormalization. Somewhat similar to
RenormalizeL2PerLayer, this strategy scales the gradients if and only
if the L2 norm of the gradients (for entire layer) exceeds a specified
threshold. Specifically, if G is gradient vector for the layer, then:
GOut = G if l2Norm(G) < threshold (i.e., no change) GOut =
threshold * G / l2Norm(G)
ClipL2PerParamType
Conditional renormalization. Very
similar to ClipL2PerLayer, however instead of clipping per layer, do
clipping on each parameter type separately. For example in a recurrent
neural network, input weight gradients, recurrent weight gradients and
bias gradient are all clipped separately.
Here you can find a complete example of the application of these GradientNormalizers.
I think I finally figured it out. I was trying to visualize the network using deeplearning4j-ui, but got some incompatible versions errors. After changing versions I got a new error, stating the networks input is expecting a 2d array and I found on the internet that this is expected across all versions.
So i changed
float[] f = new float[7 * 6];
Nd4j.create(f);
to
float[][] f = new float[1][7 * 6];
Nd4j.createFromArray(f);
And the NaN values finally disappeared. #aran So I guess assuming incorrect inputs was definitly the right direction. Thank you so much for your help :)
Related
To improve my knowledge of imaging and get some experience working with the topics, I decided to create a license plate recognition algorithm on the Android platform.
The first step is detection, for which I decided to implement a recent paper titled "A Robust and Efficient Approach to License Plate Detection". The paper presents their idea very well and uses quite simple techniques to achieve detection. Besides some details lacking in the paper, I implemented the bilinear downsampling, converting to gray scale, and the edging + adaptive thresholding as described in Section 3A, 3B.1, and 3B.2.
Unfortunately, I am not getting the output this paper presents in e.g. figure 3 and 6.
The image I use for testing is as follows:
The gray scale (and downsampled) version looks fine (see the bottom of this post for the actual implementation), I used a well-known combination of the RGB components to produce it (paper does not mention how, so I took a guess).
Next is the initial edge detection using the Sobel filter outlined. This produces an image similar to the ones presented in figure 6 of the paper.
And finally, the remove the "weak edges" they apply adaptive thresholding using a 20x20 window. Here is where things go wrong.
As you can see, it does not function properly, even though I am using their stated parameter values. Additionally I have tried:
Changing the beta parameter.
Use a 2d int array instead of Bitmap objects to simplify creating the integral image.
Try a higher Gamma parameter so the initial edge detection allows more "edges".
Change the window to e.g. 10x10.
Yet none of the changes made an improvement; it keeps producing images as the one above. My question is: what am I doing different than what is outlined in the paper? and how can I get the desired output?
Code
The (cleaned) code I use:
public int[][] toGrayscale(Bitmap bmpOriginal) {
int width = bmpOriginal.getWidth();
int height = bmpOriginal.getHeight();
// color information
int A, R, G, B;
int pixel;
int[][] greys = new int[width][height];
// scan through all pixels
for (int x = 0; x < width; ++x) {
for (int y = 0; y < height; ++y) {
// get pixel color
pixel = bmpOriginal.getPixel(x, y);
R = Color.red(pixel);
G = Color.green(pixel);
B = Color.blue(pixel);
int gray = (int) (0.2989 * R + 0.5870 * G + 0.1140 * B);
greys[x][y] = gray;
}
}
return greys;
}
The code for edge detection:
private int[][] detectEges(int[][] detectionBitmap) {
int width = detectionBitmap.length;
int height = detectionBitmap[0].length;
int[][] edges = new int[width][height];
// Loop over all pixels in the bitmap
int c1 = 0;
int c2 = 0;
for (int y = 0; y < height; y++) {
for (int x = 2; x < width -2; x++) {
// Calculate d0 for each pixel
int p0 = detectionBitmap[x][y];
int p1 = detectionBitmap[x-1][y];
int p2 = detectionBitmap[x+1][y];
int p3 = detectionBitmap[x-2][y];
int p4 = detectionBitmap[x+2][y];
int d0 = Math.abs(p1 + p2 - 2*p0) + Math.abs(p3 + p4 - 2*p0);
if(d0 >= Gamma) {
c1++;
edges[x][y] = Gamma;
} else {
c2++;
edges[x][y] = d0;
}
}
}
return edges;
}
The code for adaptive thresholding. The SAT implementation is taken from here:
private int[][] AdaptiveThreshold(int[][] detectionBitmap) {
// Create the integral image
processSummedAreaTable(detectionBitmap);
int width = detectionBitmap.length;
int height = detectionBitmap[0].length;
int[][] binaryImage = new int[width][height];
int white = 0;
int black = 0;
int h_w = 20; // The window size
int half = h_w/2;
// Loop over all pixels in the bitmap
for (int y = half; y < height - half; y++) {
for (int x = half; x < width - half; x++) {
// Calculate d0 for each pixel
int sum = 0;
for(int k = -half; k < half - 1; k++) {
for (int j = -half; j < half - 1; j++) {
sum += detectionBitmap[x + k][y + j];
}
}
if(detectionBitmap[x][y] >= (sum / (h_w * h_w)) * Beta) {
binaryImage[x][y] = 255;
white++;
} else {
binaryImage[x][y] = 0;
black++;
}
}
}
return binaryImage;
}
/**
* Process given matrix into its summed area table (in-place)
* O(MN) time, O(1) space
* #param matrix source matrix
*/
private void processSummedAreaTable(int[][] matrix) {
int rowSize = matrix.length;
int colSize = matrix[0].length;
for (int i=0; i<rowSize; i++) {
for (int j=0; j<colSize; j++) {
matrix[i][j] = getVal(i, j, matrix);
}
}
}
/**
* Helper method for processSummedAreaTable
* #param row current row number
* #param col current column number
* #param matrix source matrix
* #return sub-matrix sum
*/
private int getVal (int row, int col, int[][] matrix) {
int leftSum; // sub matrix sum of left matrix
int topSum; // sub matrix sum of top matrix
int topLeftSum; // sub matrix sum of top left matrix
int curr = matrix[row][col]; // current cell value
/* top left value is itself */
if (row == 0 && col == 0) {
return curr;
}
/* top row */
else if (row == 0) {
leftSum = matrix[row][col - 1];
return curr + leftSum;
}
/* left-most column */
if (col == 0) {
topSum = matrix[row - 1][col];
return curr + topSum;
}
else {
leftSum = matrix[row][col - 1];
topSum = matrix[row - 1][col];
topLeftSum = matrix[row - 1][col - 1]; // overlap between leftSum and topSum
return curr + leftSum + topSum - topLeftSum;
}
}
Marvin provides an approach to find text regions. Perhaps it can be a start point for you:
Find Text Regions in Images:
http://marvinproject.sourceforge.net/en/examples/findTextRegions.html
This approach was also used in this question:
How do I separates text region from image in java
Using your image I got this output:
Source Code:
package textRegions;
import static marvin.MarvinPluginCollection.findTextRegions;
import java.awt.Color;
import java.util.List;
import marvin.image.MarvinImage;
import marvin.image.MarvinSegment;
import marvin.io.MarvinImageIO;
public class FindVehiclePlate {
public FindVehiclePlate() {
MarvinImage image = MarvinImageIO.loadImage("./res/vehicle.jpg");
image = findText(image, 30, 20, 100, 170);
MarvinImageIO.saveImage(image, "./res/vehicle_out.png");
}
public MarvinImage findText(MarvinImage image, int maxWhiteSpace, int maxFontLineWidth, int minTextWidth, int grayScaleThreshold){
List<MarvinSegment> segments = findTextRegions(image, maxWhiteSpace, maxFontLineWidth, minTextWidth, grayScaleThreshold);
for(MarvinSegment s:segments){
if(s.height >= 10){
s.y1-=20;
s.y2+=20;
image.drawRect(s.x1, s.y1, s.x2-s.x1, s.y2-s.y1, Color.red);
image.drawRect(s.x1+1, s.y1+1, (s.x2-s.x1)-2, (s.y2-s.y1)-2, Color.red);
image.drawRect(s.x1+2, s.y1+2, (s.x2-s.x1)-4, (s.y2-s.y1)-4, Color.red);
}
}
return image;
}
public static void main(String[] args) {
new FindVehiclePlate();
}
}
I am trying to extract user silhouette and put it above my images. I was able to make a mask and cut user from rgb image. But the contour is messy.
The question is how I can make the mask more precise (to fit real user). I've tried ERODE-DILATE filters, but they don't do much. Maybe I need some Feather filter like in Photoshop. Or I don't know.
Here is my code.
import SimpleOpenNI.*;
SimpleOpenNI context;
PImage mask;
void setup()
{
size(640*2, 480);
context = new SimpleOpenNI(this);
if (context.isInit() == false)
{
exit();
return;
}
context.enableDepth();
context.enableRGB();
context.enableUser();
context.alternativeViewPointDepthToImage();
}
void draw()
{
frame.setTitle(int(frameRate) + " fps");
context.update();
int[] userMap = context.userMap();
background(0, 0, 0);
mask = loadImage("black640.jpg"); //just a black image
int xSize = context.depthWidth();
int ySize = context.depthHeight();
mask.loadPixels();
for (int y = 0; y < ySize; y++) {
for (int x = 0; x < xSize; x++) {
int index = x + y*xSize;
if (userMap[index]>0) {
mask.pixels[index]=color(255, 255, 255);
}
}
}
mask.updatePixels();
image(mask, 0, 0);
mask.filter(DILATE);
mask.filter(DILATE);
PImage rgb = context.rgbImage();
rgb.mask(mask);
image(rgb, context.depthWidth() + 10, 0);
}
It's good you're aligning the RGB and depth streams.
There are few things that could be improved in terms of efficiency:
No need to reload a black image every single frame (in the draw() loop) since you're modifying all the pixels anyway:
mask = loadImage("black640.jpg"); //just a black image
Also, since you don't need the x,y coordinates as you loop through the user data, you can use a single for loop which should be a bit faster:
for(int i = 0 ; i < numPixels ; i++){
mask.pixels[i] = userMap[i] > 0 ? color(255) : color(0);
}
instead of:
for (int y = 0; y < ySize; y++) {
for (int x = 0; x < xSize; x++) {
int index = x + y*xSize;
if (userMap[index]>0) {
mask.pixels[index]=color(255, 255, 255);
}
}
}
Another hacky thing you could do is retrieve the userImage() from SimpleOpenNI, instead of the userData() and apply a THRESHOLD filter to it, which in theory should give you the same result as above.
For example:
int[] userMap = context.userMap();
background(0, 0, 0);
mask = loadImage("black640.jpg"); //just a black image
int xSize = context.depthWidth();
int ySize = context.depthHeight();
mask.loadPixels();
for (int y = 0; y < ySize; y++) {
for (int x = 0; x < xSize; x++) {
int index = x + y*xSize;
if (userMap[index]>0) {
mask.pixels[index]=color(255, 255, 255);
}
}
}
could be:
mask = context.userImage();
mask.filter(THRESHOLD);
In terms of filtering, if you want to shrink the silhouette you should ERODE and bluring should give you a bit of that Photoshop like feathering.
Note that some filter() calls take arguments (like BLUR), but others don't like the ERODE/DILATE morphological filters, but you can still roll your own loops to deal with that.
I also recommend having some sort of easy to tweak interface (it can be fancy slider or a simple keyboard shortcut) when playing with filters.
Here's a rough attempt at the refactored sketch with the above comments:
import SimpleOpenNI.*;
SimpleOpenNI context;
PImage mask;
int numPixels = 640*480;
int dilateAmt = 1;
int erodeAmt = 1;
int blurAmt = 0;
void setup()
{
size(640*2, 480);
context = new SimpleOpenNI(this);
if (context.isInit() == false)
{
exit();
return;
}
context.enableDepth();
context.enableRGB();
context.enableUser();
context.alternativeViewPointDepthToImage();
mask = createImage(640,480,RGB);
}
void draw()
{
frame.setTitle(int(frameRate) + " fps");
context.update();
int[] userMap = context.userMap();
background(0, 0, 0);
//you don't need to keep reloading the image every single frame since you're updating all the pixels bellow anyway
// mask = loadImage("black640.jpg"); //just a black image
// mask.loadPixels();
// int xSize = context.depthWidth();
// int ySize = context.depthHeight();
// for (int y = 0; y < ySize; y++) {
// for (int x = 0; x < xSize; x++) {
// int index = x + y*xSize;
// if (userMap[index]>0) {
// mask.pixels[index]=color(255, 255, 255);
// }
// }
// }
//a single loop is usually faster than a nested loop and you don't need the x,y coordinates anyway
for(int i = 0 ; i < numPixels ; i++){
mask.pixels[i] = userMap[i] > 0 ? color(255) : color(0);
}
//erode
for(int i = 0 ; i < erodeAmt ; i++) mask.filter(ERODE);
//dilate
for(int i = 0 ; i < dilateAmt; i++) mask.filter(DILATE);
//blur
mask.filter(BLUR,blurAmt);
mask.updatePixels();
//preview the mask after you process it
image(mask, 0, 0);
PImage rgb = context.rgbImage();
rgb.mask(mask);
image(rgb, context.depthWidth() + 10, 0);
//print filter values for debugging purposes
fill(255);
text("erodeAmt: " + erodeAmt + "\tdilateAmt: " + dilateAmt + "\tblurAmt: " + blurAmt,15,15);
}
void keyPressed(){
if(key == 'e') erodeAmt--;
if(key == 'E') erodeAmt++;
if(key == 'd') dilateAmt--;
if(key == 'D') dilateAmt++;
if(key == 'b') blurAmt--;
if(key == 'B') blurAmt++;
//constrain values
if(erodeAmt < 0) erodeAmt = 0;
if(dilateAmt < 0) dilateAmt = 0;
if(blurAmt < 0) blurAmt = 0;
}
Unfortunately I can't test with an actual sensor right now, so please use the concepts explained, but bare in mind the full sketch code isn't tested.
This above sketch (if it runs) should allow you to use keys to control the filter parameters (e/E to decrease/increase erosion, d/D for dilation, b/B for blur). Hopefully you'll get satisfactory results.
When working with SimpleOpenNI in general I advise recording an .oni file (check out the RecorderPlay example for that) of a person for the most common use case. This will save you some time on the long run when testing and will allow you to work remotely with the sensor detached. One thing to bare in mind, the depth resolution is reduced to half on recordings (but using a usingRecording boolean flag should keep things safe)
The last and probably most important point is about the quality of the end result. Your resulting image can't be that much better if the source image isn't easy to work with to begin with. The depth data from the original Kinect sensor isn't great. The Asus sensors feel a wee bit more stable, but still the difference is negligible in most cases. If you are going to stick to one of these sensors, make sure you've got a clear background and decent lighting (without too much direct warm light (sunlight, incandescent lightbulbs, etc.) since they may interfere with the sensor)
If you want a more accurate user cut and the above filtering doesn't get the results you're after, consider switching to a better sensor like KinectV2. The depth quality is much better and the sensor is less susceptible to direct warm light. This may mean you need to use Windows (I see there's a KinectPV2 wrapper available) or OpenFrameworks(c++ collections of libraries similar to Processing) with ofxKinectV2
I've tried built-in erode-dilate-blur in processing. But they are very inefficient. Every time I increment blurAmount in img.filter(BLUR,blurAmount), my FPS decreases by 5 frames.
So I decided to try opencv. It is much better in comparison. The result is satisfactory.
import SimpleOpenNI.*;
import processing.video.*;
import gab.opencv.*;
SimpleOpenNI context;
OpenCV opencv;
PImage mask;
int numPixels = 640*480;
int dilateAmt = 1;
int erodeAmt = 1;
int blurAmt = 1;
Movie mov;
void setup(){
opencv = new OpenCV(this, 640, 480);
size(640*2, 480);
context = new SimpleOpenNI(this);
if (context.isInit() == false) {
exit();
return;
}
context.enableDepth();
context.enableRGB();
context.enableUser();
context.alternativeViewPointDepthToImage();
mask = createImage(640, 480, RGB);
mov = new Movie(this, "wild.mp4");
mov.play();
mov.speed(5);
mov.volume(0);
}
void movieEvent(Movie m) {
m.read();
}
void draw() {
frame.setTitle(int(frameRate) + " fps");
context.update();
int[] userMap = context.userMap();
background(0, 0, 0);
mask.loadPixels();
for (int i = 0; i < numPixels; i++) {
mask.pixels[i] = userMap[i] > 0 ? color(255) : color(0);
}
mask.updatePixels();
opencv.loadImage(mask);
opencv.gray();
for (int i = 0; i < erodeAmt; i++) {
opencv.erode();
}
for (int i = 0; i < dilateAmt; i++) {
opencv.dilate();
}
if (blurAmt>0) {//blur with 0 amount causes error
opencv.blur(blurAmt);
}
mask = opencv.getSnapshot();
image(mask, 0, 0);
PImage rgb = context.rgbImage();
rgb.mask(mask);
image(mov, context.depthWidth() + 10, 0);
image(rgb, context.depthWidth() + 10, 0);
fill(255);
text("erodeAmt: " + erodeAmt + "\tdilateAmt: " + dilateAmt + "\tblurAmt: " + blurAmt, 15, 15);
}
void keyPressed() {
if (key == 'e') erodeAmt--;
if (key == 'E') erodeAmt++;
if (key == 'd') dilateAmt--;
if (key == 'D') dilateAmt++;
if (key == 'b') blurAmt--;
if (key == 'B') blurAmt++;
//constrain values
if (erodeAmt < 0) erodeAmt = 0;
if (dilateAmt < 0) dilateAmt = 0;
if (blurAmt < 0) blurAmt = 0;
}
I try to create a little game in java but I'm in trouble.
When I draw a map, I'm not able to display the characters without overwrite the titlesets of the squares.
My goal is to be able to display many pictures on the same square (like the titleset of the grass, the character and a tree), so I have to deal with the transparency of my pictures (this is not the problem) and the layer (it is the problem).
So how can I display an image on another image?
How can I explain to java that I need to display this image on or under another image?
This is my source code. I don't know is that can help you. Your help can be really helpfull for me if you give me a clue or a function who is able to manage the layers. That is useless to rewrite all the code for me x)
This program is not complete, I use it only for test my program right now. I know that he refresh two times the map so he overwrite the square of the character (and he have many others littles glitchs), but that is not the purpose of my question. I try to done my game by step!
import javax.swing.JFrame;
import javax.swing.ImageIcon;
import javax.swing.JLabel;
import javax.swing.SwingUtilities;
public class Window extends Thread
{
private static JFrame window = new JFrame("game");
public void run()
{
Map map = new Map();
Characters characters = new Characters();
window.setDefaultCloseOperation( JFrame.EXIT_ON_CLOSE );
window.setSize(Settings.sizeX, Settings.sizeY);
window.setLocationRelativeTo(null);
window.setResizable(false);
window.setVisible(true);
map.start();
characters.start();
}
private static void reload() throws Exception
{
SwingUtilities.updateComponentTreeUI(window);
}
private static class Map extends Thread
{
private int numberSquareX = Settings.sizeX / 20 + 1;
private int numberSquareY = Settings.sizeY / 20 + 1;
private JLabel square[][] = new JLabel[numberSquareX][numberSquareY];
public void run()
{
for (int x = 0, y = 0; y < numberSquareY; x++)
{
square[x][y] = new JLabel(new ImageIcon("grass_1.png"));
square[x][y].setBounds(x * 20, y * 20, 20, 20);
window.add(square[x][y]);
if (x == numberSquareX - 1)
{
y++;
x = -1;
}
}
square[numberSquareX - 1][numberSquareY - 1] = new JLabel(new ImageIcon("grass_1.png"));
square[numberSquareX - 1][numberSquareY - 1].setBounds(numberSquareX * 20, numberSquareY * 20, 20, 20);
window.add(square[numberSquareX - 1][numberSquareY - 1]);
try
{
reload();
}
catch (Exception e)
{
}
return;
}
}
private class Characters extends Thread
{
private JLabel square[][] = new JLabel[1][1];
public void run()
{
square[0][0] = new JLabel(new ImageIcon("character_1.png"));
square[0][0].setBounds(Test.posX, Test.posX, 20, 20);
window.add(square[0][0]);
try
{
reload();
}
catch (Exception e)
{
}
return;
}
}
}
I have already find this subjects: How to use JLayered Pane to display an image on top of another image? and this one Best practice for creating a composite image output for Java Swing but they haven't really help me...
I continue to search the answer. If I find it, I will come back for post it here.
Solved.
Thanks to MadProgrammer for his comments.
Render the title map to a BufferedImage, either in it's entirety or based on the available viewable area, which ever is more efficient. Paint this to the screen, then paint your character on top it – MadProgrammer
In 15+ years of professional Java/Swing development, I've never found a need to use SwingUtilities.updateComponentTreeUI(window);, instead, simply call repaint on the component which is responsible for renderer the output, I'm pretty sure, you'll find this more efficient. – MadProgrammer
Swing is also a single threaded environment AND is not thread safe, you should NOT be update the UI from outside of the context of the Event Dispatching Thread, as this will setup a race condition and could result in unwanted and difficult to resolve graphical issues. – MadProgrammer
Hint. JLabel is a descendent of Container, which means it can contain other components ;) – MadProgrammer
Thanks a lot MadProgrammer! So I have replace SwingUtilities.updateComponentTreeUI(window) by window.repaint(). You had right for the Thread safe, my map had some bugs but I wasn't able to find where they was from. And what about the BufferedImage? If I create two BufferedImage, the last one can be automatically on the top of the first one? Or I just want to render the title map to a BufferedImage (so I am limited by 2 layers)? – Celine
It will depend on what it is you want to achieve. Using BufferedImages gives you complete control over the placement of the images and yes, one can be rendered over the other, painting is like a artists canvas, as you add things to it, they are added on top of what is already there, BUT, you might find it easier to add a JLabel to another JLabel - just remember, JLabel doesn't have a layout manager by default – MadProgrammer
Code example:
import javax.swing.JFrame;
import javax.swing.ImageIcon;
import javax.swing.JLabel;
public class Window extends Thread
{
private static JFrame window = new JFrame("game");
private int numberSquareX = Settings.sizeX / 20 + 1;
private int numberSquareY = Settings.sizeY / 20 + 1;
private JLabel titlesetLayer1[][] = new JLabel[numberSquareX][numberSquareY];
private JLabel titlesetLayer2[] = new JLabel[1];
private JLabel titlesetLayer3[] = new JLabel[1];
private JLabel titlesetLayer4[] = new JLabel[0];
private JLabel titlesetLayer5[] = new JLabel[0];
private JLabel characters[] = new JLabel[2];
public void run()
{
window.setDefaultCloseOperation( JFrame.EXIT_ON_CLOSE );
window.setSize(Settings.sizeX, Settings.sizeY);
window.setLocationRelativeTo(null);
window.setResizable(false);
// draw layer5 (on the layer4)
// draw layer4 (on the layer3)
// draw layer3 (on the characters)
titlesetLayer3[0] = new JLabel(new ImageIcon("tree_1.png"));
titlesetLayer3[0].setBounds(130, 120, 126, 160);
window.add(titlesetLayer3[0]);
// draw the charaters
characters[1] = new JLabel(new ImageIcon("character_1.png"));
characters[1].setBounds(600, 500, 100, 100);
window.add(characters[1]);
characters[0] = new JLabel(new ImageIcon("character_1.png"));
characters[0].setBounds(100, 100, 100, 100);
window.add(characters[0]);
// draw layer2 (under the characters)
titlesetLayer2[0] = new JLabel(new ImageIcon("tree_1.png"));
titlesetLayer2[0].setBounds(570, 400, 126, 160);
window.add(titlesetLayer2[0]);
// draw layer1 (under the layer2)
for (int x = 0, y = 0; y < numberSquareY; x++)
{
titlesetLayer1[x][y] = new JLabel(new ImageIcon("grass_1.png"));
titlesetLayer1[x][y].setBounds(x * 20, y * 20, 20, 20);
window.add(titlesetLayer1[x][y]);
if (x == numberSquareX - 1)
{
y++;
x = -1;
}
}
titlesetLayer1[numberSquareX - 1][numberSquareY - 1] = new JLabel(new ImageIcon("grass_1.png"));
titlesetLayer1[numberSquareX - 1][numberSquareY - 1].setBounds(numberSquareX * 20, numberSquareY * 20, 20, 20);
window.add(titlesetLayer1[numberSquareX - 1][numberSquareY - 1]);
window.setVisible(true);
// window.repaint();
}
}
Screen capture:
1
Another solution is to use JLayeredPane!
JLayeredPane layers = new JLayeredPane();
layers.add(tilesetsUnderCharacter, 0); // Layer 0
layers.add(character, 1); // Layer 1
layers.add(tilesetsOnCharacter, 2); // Layer 2
frame.setContentPane(layers);
Code example:
private void init()
{
frame.setDefaultCloseOperation(javax.swing.JFrame.EXIT_ON_CLOSE);
frame.setSize(Settings.getX(), Settings.getY());
frame.setResizable(false);
frame.setLocationRelativeTo(null);
for (int i = 0; i < y ; i++)
{
for (int j = 0; j < x; j++)
{
for (int k = 0; k < tilesetsOnCharactersSize; k++)
{
tilesetsOnCharacters[i][j][k] = new javax.swing.JLabel(new javax.swing.ImageIcon(Resources.getTileset(Maps.getMapTileset(mapNumber, 1, k, i, j))));
tilesetsOnCharacters[i][j][k].setBounds(j * tilesetX, i * tilesetY, tilesetX, tilesetY);
map.add(tilesetsOnCharacters[i][j][k], 4);
}
for (int k = 0; k < tilesetsUnderCharactersSize; k++)
{
tilesetsUnderCharacters[i][j][k] = new javax.swing.JLabel(new javax.swing.ImageIcon(Resources.getTileset(Maps.getMapTileset(mapNumber, 0, k, i, j))));
tilesetsUnderCharacters[i][j][k].setBounds(j * tilesetX, i * tilesetY, tilesetX, tilesetY);
map.add(tilesetsUnderCharacters[i][j][k], 0);
}
for (int k = 0; k < mapAttributeSize; k++)
{
if (Maps.getMapTileset(mapNumber, 2, k, i, j) == 1)
{
blocked[i][j] = true;
}
}
}
}
for (int i = 0; i < charactersNumber; i++)
{
characters[i] = new Character(0, 0, 64, 64, 0, 0, 0, 0, 5);
tilesetsCharacters[i] = new javax.swing.JLabel(new javax.swing.ImageIcon(Characters.getCharacter(characters[i].getCharacterSkin(), characters[i].getDirection())));
tilesetsCharacters[i].setBounds(characters[i].getX(), characters[i].getY(), characters[i].getSizeX(), characters[i].getSizeY());
map.add(tilesetsCharacters[i], 1);
charactersRender[i] = false;
}
frame.addKeyListener(new java.awt.event.KeyAdapter()
{
#Override
public void keyTyped(java.awt.event.KeyEvent keyEvent)
{
}
#Override
public void keyPressed(java.awt.event.KeyEvent keyEvent)
{
if((keyEventInt = keyEvent.getKeyCode()) == java.awt.event.KeyEvent.VK_F)
{
right = true;
}
else if(keyEventInt == java.awt.event.KeyEvent.VK_S)
{
left = true;
}
else if(keyEventInt == java.awt.event.KeyEvent.VK_E)
{
up = true;
}
else if(keyEventInt == java.awt.event.KeyEvent.VK_D)
{
down = true;
}
}
#Override
public void keyReleased(java.awt.event.KeyEvent keyEvent)
{
if((keyEventInt = keyEvent.getKeyCode()) == java.awt.event.KeyEvent.VK_F)
{
right = false;
}
else if(keyEventInt == java.awt.event.KeyEvent.VK_S)
{
left = false;
}
else if(keyEventInt == java.awt.event.KeyEvent.VK_E)
{
up = false;
}
else if(keyEventInt == java.awt.event.KeyEvent.VK_D)
{
down = false;
}
}
});
frame.setContentPane(map);
frame.setVisible(true);
}
private void update()
{
if (exit && characters[0].getX() < x * tilesetX - characters[0].getSizeX() - characters[0].getMovementSpeed() && characters[0].getX() > 0 && characters[0].getY() > 0 && characters[0].getY() < y * tilesetY - characters[0].getSizeY() - characters[0].getMovementSpeed())
{
exit = false;
}
if (right && (exit || (characters[0].getX() < x * tilesetX - characters[0].getSizeX() - characters[0].getMovementSpeed() && !blocked[characters[0].getY() / tilesetY][(characters[0].getX() + characters[0].getSizeX() + characters[0].getMovementSpeed()) / tilesetX] && !blocked[(characters[0].getY() + characters[0].getSizeY()) / tilesetY][(characters[0].getX() + characters[0].getSizeX() + characters[0].getMovementSpeed()) / tilesetX])))
{
characters[0].right();
characters[0].setScaleX(5);
if (allowExitRight && characters[0].getX() > x * tilesetX - characters[0].getSizeX() - characters[0].getMovementSpeed() - 1)
exit = true;
charactersRender[0] = true;
}
if (left && (exit || (characters[0].getX() > 0 && !blocked[characters[0].getY() / tilesetY][(characters[0].getX() - characters[0].getMovementSpeed()) / tilesetX] && !blocked[(characters[0].getY() + characters[0].getSizeY()) / tilesetY][(characters[0].getX() - characters[0].getMovementSpeed()) / tilesetX])))
{
characters[0].left();
characters[0].setScaleX(-3);
if (allowExitLeft && characters[0].getX() <= 0)
exit = true;
charactersRender[0] = true;
}
if (jumped || up && (exit || (characters[0].getY() > 0 && !blocked[(characters[0].getY() - characters[0].getMovementSpeed()) / tilesetY][characters[0].getX() / tilesetX] && !blocked[(characters[0].getY() - characters[0].getMovementSpeed()) / tilesetY][(characters[0].getX() + characters[0].getSizeX()) / tilesetX])))
{
if (!jump)
{
characters[0].up();
characters[0].setScaleY(-3);
if (allowExitUp && characters[0].getY() <= 0)
exit = true;
charactersRender[0] = true;
}
else if (!jumped && !falling)
{
jumpCurrentDuration = jumpDuration;
jumped = true;
}
else if (--jumpCurrentDuration > 0)
{
if (exit || (characters[0].getY() > 0 && !blocked[(characters[0].getY() - characters[0].getMovementSpeed()) / tilesetY][characters[0].getX() / tilesetX] && !blocked[(characters[0].getY() - characters[0].getMovementSpeed()) / tilesetY][(characters[0].getX() + characters[0].getSizeX()) / tilesetX]))
{
characters[0].up();
characters[0].setScaleY(-3);
if (allowExitUp && characters[0].getY() <= 0)
exit = true;
charactersRender[0] = true;
}
}
else
{
jumped = false;
}
}
if (((down && !jumped) || (gravity && !jumped)) && (exit || (characters[0].getY() < y * tilesetY - characters[0].getSizeY() - characters[0].getMovementSpeed() && !blocked[(characters[0].getY() + characters[0].getSizeY() + characters[0].getMovementSpeed()) / tilesetX][characters[0].getX() / tilesetX] && !blocked[(characters[0].getY() + characters[0].getSizeY() + characters[0].getMovementSpeed()) / tilesetY][(characters[0].getX() + characters[0].getSizeX()) / tilesetX])))
{
characters[0].down();
characters[0].setScaleY(5);
if (allowExitDown && characters[0].getY() > y * tilesetY - characters[0].getSizeY() - characters[0].getMovementSpeed())
exit = true;
if (jump)
falling = true;
charactersRender[0] = true;
}
else if (jump)
falling = false;
}
private void render()
{
for (int i = 0; i < charactersNumber; i++)
{
if (charactersRender[i])
{
tilesetsCharacters[i].setIcon(new javax.swing.ImageIcon(Characters.getCharacter(characters[i].getCharacterSkin(), characters[i].getDirection())));
tilesetsCharacters[i].setBounds(characters[i].getX() + characters[i].getScaleX(), characters[i].getY() + characters[i].getScaleY(), characters[i].getSizeX(), characters[i].getSizeY());
charactersRender[i] = false;
}
}
}
Screen Capture:
Edit: I also found a library called Slick2D who work with TiledMapEditor:
http://slick.ninjacave.com/
http://www.mapeditor.org/
How to setup Slick2D: How to install Slick2d?
How to use Slick2D and TiledMapEditor: Slick2D + Tiled cant load map
Where started: https://thejavablog.wordpress.com/2008/06/08/using-slick-2d-to-write-a-game/
I want to create a audio level meter in java for the microphone to check how loud the input is. It should look like the one of the OS. I'm not asking about the gui. It is just about calculating the audio level out of the bytestream produced by
n = targetDataLine.read( tempBuffer , 0 , tempBuffer.length );
So I already have something that is running, but it is not even close to the levelmeter of my OS (windows) It stucks in the middle. I have values between 0 and 100 that is good but in the middle volume it stucks around 60 no matter how loud the input is.
This is how I calculate it now:
amplitude = 0;
for (int j = 0; j < tempBuffer.length; j = j +2 ){
if (tempBuffer[j] > tempBuffer[j+1])
amplitude = amplitude + tempBuffer[j] - tempBuffer[j+1];
else amplitude = amplitude + tempBuffer[j + 1] - tempBuffer[j];
}
amplitude = amplitude / tempBuffer.length * 2;
Is there a better/more precise way to calculate the audio level to monitor it? Or did I maybe do a major mistake?
That is my Audioformat:
public static AudioFormat getAudioFormat(){
float sampleRate = 20000.0F;
//8000,11025,16000,22050,44100
int sampleSizeInBits = 16;
//8,16
int channels = 1;
//1,2
boolean signed = true;
//true,false
boolean bigEndian = false;
//true,false
return new AudioFormat( sampleRate, sampleSizeInBits, channels, signed, bigEndian );
//return new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 8000.0F, 8, 1, 1, 8000.0F, false);
}
Principally the problem seems to be that you are reading the audio data incorrectly.
Specifically I'm not really sure what this excerpt is supposed to mean:
if (tempBuffer[j] > tempBuffer[j+1])
... tempBuffer[j] - tempBuffer[j+1];
else
... tempBuffer[j + 1] - tempBuffer[j];
But anyhow since you are recording 16-bit data the bytes in the byte array aren't meaningful on their own. Each byte only represents 1/2 of the bits in each sample. You need to 'unpack' them to int, float, whatever, before you can do anything with them. For raw LPCM, concatenating the bytes is done by shifting them and ORing them together.
Here is an MCVE to demonstrate a rudimentary level meter (both RMS and simple peak hold) in Java.
import javax.swing.SwingUtilities;
import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.JComponent;
import java.awt.BorderLayout;
import java.awt.Graphics;
import java.awt.Color;
import java.awt.Dimension;
import javax.swing.border.EmptyBorder;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.TargetDataLine;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.LineUnavailableException;
public class LevelMeter extends JComponent {
private int meterWidth = 10;
private float amp = 0f;
private float peak = 0f;
public void setAmplitude(float amp) {
this.amp = Math.abs(amp);
repaint();
}
public void setPeak(float peak) {
this.peak = Math.abs(peak);
repaint();
}
public void setMeterWidth(int meterWidth) {
this.meterWidth = meterWidth;
}
#Override
protected void paintComponent(Graphics g) {
int w = Math.min(meterWidth, getWidth());
int h = getHeight();
int x = getWidth() / 2 - w / 2;
int y = 0;
g.setColor(Color.LIGHT_GRAY);
g.fillRect(x, y, w, h);
g.setColor(Color.BLACK);
g.drawRect(x, y, w - 1, h - 1);
int a = Math.round(amp * (h - 2));
g.setColor(Color.GREEN);
g.fillRect(x + 1, y + h - 1 - a, w - 2, a);
int p = Math.round(peak * (h - 2));
g.setColor(Color.RED);
g.drawLine(x + 1, y + h - 1 - p, x + w - 1, y + h - 1 - p);
}
#Override
public Dimension getMinimumSize() {
Dimension min = super.getMinimumSize();
if(min.width < meterWidth)
min.width = meterWidth;
if(min.height < meterWidth)
min.height = meterWidth;
return min;
}
#Override
public Dimension getPreferredSize() {
Dimension pref = super.getPreferredSize();
pref.width = meterWidth;
return pref;
}
#Override
public void setPreferredSize(Dimension pref) {
super.setPreferredSize(pref);
setMeterWidth(pref.width);
}
public static void main(String[] args) {
SwingUtilities.invokeLater(new Runnable() {
#Override
public void run() {
JFrame frame = new JFrame("Meter");
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
JPanel content = new JPanel(new BorderLayout());
content.setBorder(new EmptyBorder(25, 50, 25, 50));
LevelMeter meter = new LevelMeter();
meter.setPreferredSize(new Dimension(9, 100));
content.add(meter, BorderLayout.CENTER);
frame.setContentPane(content);
frame.pack();
frame.setLocationRelativeTo(null);
frame.setVisible(true);
new Thread(new Recorder(meter)).start();
}
});
}
static class Recorder implements Runnable {
final LevelMeter meter;
Recorder(final LevelMeter meter) {
this.meter = meter;
}
#Override
public void run() {
AudioFormat fmt = new AudioFormat(44100f, 16, 1, true, false);
final int bufferByteSize = 2048;
TargetDataLine line;
try {
line = AudioSystem.getTargetDataLine(fmt);
line.open(fmt, bufferByteSize);
} catch(LineUnavailableException e) {
System.err.println(e);
return;
}
byte[] buf = new byte[bufferByteSize];
float[] samples = new float[bufferByteSize / 2];
float lastPeak = 0f;
line.start();
for(int b; (b = line.read(buf, 0, buf.length)) > -1;) {
// convert bytes to samples here
for(int i = 0, s = 0; i < b;) {
int sample = 0;
sample |= buf[i++] & 0xFF; // (reverse these two lines
sample |= buf[i++] << 8; // if the format is big endian)
// normalize to range of +/-1.0f
samples[s++] = sample / 32768f;
}
float rms = 0f;
float peak = 0f;
for(float sample : samples) {
float abs = Math.abs(sample);
if(abs > peak) {
peak = abs;
}
rms += sample * sample;
}
rms = (float)Math.sqrt(rms / samples.length);
if(lastPeak > peak) {
peak = lastPeak * 0.875f;
}
lastPeak = peak;
setMeterOnEDT(rms, peak);
}
}
void setMeterOnEDT(final float rms, final float peak) {
SwingUtilities.invokeLater(new Runnable() {
#Override
public void run() {
meter.setAmplitude(rms);
meter.setPeak(peak);
}
});
}
}
}
Note the format conversion is hard-coded there.
You may also see "How do I use audio sample data from Java Sound?" for my detailed explanation of how to unpack audio data from the raw bytes.
Related:
How to keep track of audio playback position?
How to make waveform rendering more interesting?
The above code will find the data point with highest value but cannot determine the peak value of the reconstructed data samples. To find the reconstructed peak you would have to pass the data samples through a low pass filter. or use a DFT/FFT algorithm.
import java.awt.*;
import java.awt.geom.Line2D;
import java.awt.geom.Point2D;
import java.awt.geom.Rectangle2D;
import java.applet.Applet;
import java.util.Scanner;
public class Histogram extends Applet{
static int [] scores= {13,30,23,8};
static int [] minInterval = {0,25,50,75};
static int [] maxInterval = {25,50,75,100};
public void paint (Graphics g){
int max = 0;
for (int i = 0; i < scores.length; i++) {
if (max < scores[i]) {
max = scores[i];
}
}
Graphics2D g2 = (Graphics2D)g;
Point2D.Double Yi = new Point2D.Double(50,50);
Point2D.Double Yf = new Point2D.Double(50,30*scores.length);
Line2D.Double Y = new Line2D.Double (Yi,Yf);
Point2D.Double Xi = new Point2D.Double(50,50);
Point2D.Double Xf = new Point2D.Double(50+(8*max),50);
Line2D.Double X = new Line2D.Double (Xi,Xf);
int x = 8*max;
//Draw the "Score"
int headerX = 50+(x/(max/5))*((max/5)-1);
g2.drawString("Histogram of Student Scores",(headerX),30);
for(int i=0;i<=max/5;i++){
int j = (i)*5;
if(i<max/5)
g2.drawString(String.format("%d",j),50+(x/(max/5))*i,50);
else
g2.drawString(String.format("Number of Students"),50+(x/(max/5))*i,50);
}
for(int i=0;i<=maxInterval.length;i++){
if(i != maxInterval.length-1 )
g2.drawString(String.format("[%d,%d)",minInterval,maxInterval),20,60+(30)*i);
else if(i == maxInterval.length-1)
g2.drawString(String.format("[%d,%d]",minInterval,maxInterval),20,60+(30)*i);
else
g2.drawString("Score Ranges",20,60+(30)*i);
}
g2.draw(X);
g2.draw(Y);
}
}
My problems is the code doesn't enter the third loop.but when i test the loop in another
method. it kinda work.so i don't know what to do next. and i want to know why it doesn't
execute that codeblock.
thank in advance.
You were supplying an array as a parameter to String.format("...", , ) (instead of an int, which you get by looking up an element from the array, presumably with index i).
Since it was an argument to String.format, you most likely weren't getting a compiler or IDE warning.
So if you fix that, the code looks like this and you can take it from there.
for (int i = 0; i <= maxInterval.length; i++) {
if (i != maxInterval.length - 1)
g2.drawString(String.format("[%d,%d]", minInterval[i], maxInterval[i]), 20, 60 + (30) * i);
else if (i == maxInterval.length - 1)
g2.drawString(String.format("[%d,%d]", minInterval[i], maxInterval[i]), 20, 60 + (30) * i);
else
g2.drawString("Score Ranges", 20, 60 + (30) * i);
}