Weka Prediction margin using Java API

Weka Prediction margin using Java API - java

Hellow everyone,
I'm using Weka Java API for predictions. I was able to get the expected and actual behavior from the java code. But now what i wanted is to get the 'prediction margin' information from final results. From GUI, i can manage, but i wanted is a Java solution. appreciate if any one can help.What i wanted to get is the below highlighted information using java.
Below code shows the code i'm using as of now to predict actual/predicted.
for (int i = 0; i < testDataSet.numInstances(); i++) {
double actualClass = testDataSet.instance(i).classValue();
String actual = testDataSet.classAttribute().value((int) actualClass);
Instance newInst = testDataSet.instance(i);
double preJ48 = tree.classifyInstance(newInst);
String predictionString = testDataSet.classAttribute().value((int) preJ48);
System.out.println("Actual : " + actual + " Prediction : " + predictionString);
}
############################# Solution i found as below ##########
J48 tree = new J48();
tree.buildClassifier(trainDataSet);
double a = eval.evaluateModelOnceAndRecordPrediction(tree, testDataSet.instance(0));
eval.evaluateModel(tree, testDataSet, plainText);
for (String line : predsBuffer.toString().split("\n")) {
String[] linesplit = line.split("\\s+");
// If there's an error(error flag "+"), the length of linesplit is 6, otherwise 5
System.out.println("linesplit "+linesplit.length);
int id;
String expectedValue, classification;
double probability;
if (line.contains("+")) {
probability = Double.parseDouble(linesplit[5]);
System.out.println("Its Minus "+probability);
} else {
probability = Double.parseDouble(linesplit[4]);
System.out.println("Its Plus "+probability);
}
}

The prediction margin that you are referring to gets generated by the weka.gui.explorer.ClassifierErrorsPlotInstances class. Check the variables probActual and probNext in its process method.
This margin is simply the difference between the probability for the actual class label and the highest probability of the label that isn't the actual class label.
You can use the distributionForInstance method of your classifier to obtain the class distribution array and then determine these two probabilities to calculate the margin for the prediction.

Related

Removing the last class attribute in test ARFF file for Weka ML Model not working in predicting model

Basically, I'm building a machine learning model in Java (Weka) to detect some patterns in strings. I have 2 class attributes that I'm trying to get my model to predict based on these patterns. My code works when I leave the attribute values in the ARFF file, but it doesn't when I take it out and replace it with question marks in the test file. When I do this, it gives me all the same values (cfb) in the output. I know the model isn't hard-coded but for testing purposes, I would like to remove these attribute values. I have already built the classifier and evaluated the model.
/**
* Make predictions based on that model. Improve the model
*
* #throws Exception
*/
public void modelPredictions(Instances trainedDataSet, Instances testedDataSet, Classifier classifierType) throws Exception {
// Get the number of classes
int numClasses = trainedDataSet.numClasses();
// print out class values in the training dataset
for (int i = 0; i < numClasses; i++) {
// get class string value using the class index
String classValue = trainedDataSet.classAttribute().value(i);
System.out.println("Class Value " + i + " is " + classValue);
}
// set class index to the last attribute
// loop through the new dataset and make predictions
System.out.println("===================");
System.out.println("Actual Class, NB Predicted");
for (int i = 0; i < testedDataSet.numInstances(); i++) {
// get class double value for current instance
double actualClass = testedDataSet.instance(i).classValue();
// get class string value using the class index using the class's int value
String actual = testedDataSet.classAttribute().value((int) actualClass);
// get Instance object of current instance
Instance newInst = testedDataSet.instance(i);
// call classifyInstance, which returns a double value for the class
double predNB = classifierType.classifyInstance(newInst);
// use this value to get string value of the predicted class
String predString = testedDataSet.classAttribute().value((int) predNB);
System.out.println(actual + ", " + predString);
}
}
Image of the test ARFF File (Sorry, was getting errors in pasting the file content of the file.

If you replace the actual class in your test set with question marks, these get interpreted as missing values. A missing value in Weka is represented by Double.NaN. Casting a missing value (ie Double.NaN) to an int will result in 0, which is the first nominal value of your class. Your actual class will always be the first class label.
The following code:
double missing = Utils.missingValue();
System.out.println("missing value as double: " + missing);
System.out.println("missing value as int: " + ((int) missing));
Outputs this:
missing value as double: NaN
missing value as int: 0

Serialize list of complex numbers in python for Java

I have this pipeline where i stream data from Python and connect to the stream in a Java applicaton. The data records are matrices of complex numbers. Now I've learned that json.dumps() can't deal with pythons complex type.
For the moment I've converted the complex values to a string, put it in a dictionary like this:
for entry in range(len(data_array)):
data_as_string = [str(i) for i in data_array[entry]["DATA"].tolist()]
send({'data': data_array[entry]["DATA"],
'coords': data_array[entry]["UVW"].tolist()})
and send it to he pipeline. But this requires extensive and expensive custom deserialization in Java, which increases the running time of the pipeline by a lot.
Currently I'm doing the deserialization like this:
JSONObject = new JSONOBJECT(string);
try {
data= jsonObject.getString("data");
uvw= jsonObject.getString("uvw");
} catch (JSONException ex) {
ex.printStackTrace();
}
And then I'm doing a lot of data.replace(string1, string2) to remove some of the signs added by the serialization and then looping through the matrix to convert every number into a Java Complex type.
My Java deserialization code looks the following:
data = data.replace("(","");
data = data.replace(")","");
data = data.replace("\"","");
data = data.replace("],[","¦");
data = data.replace("[","");
data = data.replace("]","");
uvw = uvw.replace("[","");
uvw = uvw.replace("]","");
String[] frequencyArrays = data.split("¦");
Complex[][] tempData = new Complex[48][4];
for(int i=0;i< frequencyArrays.length;i++){
String[] complexNumbersOfAFrequency = frequencyArrays[i].split(", ");
for(int j =0;j<complexNumbersOfAFrequency.length;j++){
boolean realPartNegative = false;
Complex c;
if(complexNumbersOfAFrequency[j].startsWith("-")){
realPartNegative = true;
//Get ridd of the first - sign to be able to split the real & imaginary parts
complexNumbersOfAFrequency[j] =complexNumbersOfAFrequency[j].replaceFirst("-","");
}
if(complexNumbersOfAFrequency[j].contains("+")){
String[] realAndImaginary = complexNumbersOfAFrequency[j].split("\\+");
try {
double real = Double.parseDouble(realAndImaginary[0]);
double imag = Double.parseDouble(realAndImaginary[1].replace("j",""));
if(realPartNegative){
c = new Complex(-real,imag);
} else {
c = new Complex(real,imag);
}
}catch(IndexOutOfBoundsException e) {
//System.out.println("Wrongly formatted number, setting it to 0");
c = new Complex(0,0);
}
catch (NumberFormatException e){
System.out.println("Wrongly formatted number, setting it to 0");
c = new Complex(0,0);
}
} else {
String[] realAndImaginary = complexNumbersOfAFrequency[j].split("-");
try {
double real = Double.parseDouble(realAndImaginary[0]);
double imag = Double.parseDouble(realAndImaginary[1].replace("j", "").replace("e", ""));
if (realPartNegative) {
c = new Complex(-real, -imag);
} else {
c = new Complex(real, -imag);
}
}
catch(IndexOutOfBoundsException e){
System.out.println("Not correctly formatted: ");
for(int temp = 0;temp<realAndImaginary.length;temp++){
System.out.println(realAndImaginary[temp]);
}
System.out.println("Setting it to (0,0)");
c = new Complex(0,0);
}
catch (NumberFormatException e){
c = new Complex(0,0);
}
}
tempData[i][j] = c;
}
}
Now my question would be if there is a way to either
1)Deserialize the Dictionary in Java without expensive String manipulations and looping through the matrices for each record or
2)Do a better Job in serializing the data in python so that it can be done better in java
Any hints are appreciated.
Edit: JSON looks the following
{"data": ["[(1 + 2j), (3 + 4j), ...]","[(5 + 6j), ...]", ..."],
"coords": [1,2,3]}
Edit: For the coordinates I can do the deserialization in Java pretty easily:
uvw = uvw.replace("[","");
uvw = uvw.replace("]","");
String[] coords = uvw.split(",");
And then cast the Strings in coords with Double.parseDouble(), howver for the data string this is way more complicated because the string is full of characters that need to be removed in order to get the actual numbers and to put them in the right place in the Complex[][] I want to cast it to

You are over-using JsonObject.getString, by using it to retrieve non-string data.
Let’s start with the coords property, since it’s a simpler case. [1,2,3] is not a string. It’s an array of numbers. Therefore, you should retrieve it as an array:
JsonArray coords = jsonObject.getJsonArray("coords");
int count = coords.size();
double[] uvw = new double[count];
for (int i = 0; i < count; i++) {
uvw[i] = coords.getJsonNumber(i).doubleValue();
}
The other property, data, is also an array, but with string elements:
JsonArray data = jsonObject.getJsonArray("data");
int count = data.size();
for (int i = 0; i < count; i++) {
String complexValuesStr = data.getString(i);
// ...
}
As for parsing out the complex numbers, I wouldn’t make all those String.replace calls. Instead, you can look for each complex value with a regular expression matcher:
Pattern complexNumberPattern = Pattern.compile(
"\\(\\s*" + // opening parenthesis
"(-?[0-9.]+)" + // group 1: match real part
"\\s*([-+])\\s*" + // group 2: match sign
"([0-9.]+)j" + // group 3: match imaginary part
"\\s*\\)"); // closing parenthesis
Matcher matcher = complexNumberPattern.matcher("");
JsonArray data = jsonObject.getJsonArray("data");
int count = data.size();
List<List<Complex>> allFrequencyValues = new ArrayList<>(count);
for (int i = 0; i < count; i++) {
String complexValuesStr = data.getString(i);
List<Complex> singleFrequencyValues = new ArrayList<>();
matcher.reset(complexValuesStr);
while (matcher.find()) {
double real = Double.parseDouble(matcher.group(1));
boolean positive = matcher.group(2).equals("+");
double imaginary = Double.parseDouble(matcher.group(3));
Complex value = new Complex(real, positive ? imaginary : -imaginary);
singleFrequencyValues.add(value);
}
allFrequencyValues.add(singleFrequencyValues);
}
You should not catch IndexOutOfBoundsException or NumberFormatException. Those indicate the input was invalid. You should not treat invalid input like it’s zero; it means the sender made an error, and you should make sure to let them know it. An exception is a good way to do that.
I have made the assumption that both terms are always present in each complex expression. For instance, 2i would appear as 0 + 2j, not just 2j. And a real number like 5 would appear as 5 + 0j. If that is not a safe assumption, the parsing gets more complicated.
Since you are concerned with performance, I would try the above; if the use of a regular expression makes the program too slow, you can always look for the parentheses and terms yourself, by stepping through the string. It will be more work but may provide a speed increase.

If I understand you correctly, your matrix would consist of arrays of complex numbers which in turn would contain a real number and an imaginary one.
If so, your data could look like this:
[[{'r':1,'j':2},{'r':3,'j':4}, ...],[{'r':5,'j':6}, ...]]
That means that you have a JSON array which contains arrays that contain objects. Those objects have 2 properties: r defining the value of the real number and j the value of the imaginary one.
Parsing that in Java should be straight forward, i.e. with some mapper like Jackson or Gson you'd just parse it into something like ComplexNumber[][] where ComplexNumber could look like this (simplified):
public class ComplexNumber {
public double r;
public double j;
}
Of course there may be already existing classes for complex numbers so you might want to use those. Additionally you might have to deserialize that manually (either because the target classes don't make it easy for the mappers or you can't/don't want to use a mapper) but in that case it would be just a matter of iterating over the JSONArray elements and extracting r and j from the JSONObjects.

Input/output in GLPK for Java

I find a lot of GLPK for Java examples about how to specify the model (problem/constraints) to the solver and read parameters from a data file, but very little about programmatic parameter input/output.
In my case I need to submit values (array of weights and values) to a knapsack problem programmatically and postprocess the solution as well (perform addtional numeric checks on the solution found) in order to decide whether to proceed or not.
Think of the equivalent of reading a param: line from a data file without calling glp_mpl_read_data or printing details of a solution to a file without calling glp_print_mip/sol/itp.
Can you provide example code or point me to the right resource?

This is only a partial answer. I managed to solve the output part using the
GLPK.get_ipt_obj_val
GLPK.get_mip_obj_val
GLPK.get_ipt_col_val
GLPK.get_mip_col_val
functions as in the following example
static void writeMipSolution(glp_prob lp) {
String name = GLPK.glp_get_obj_name(lp);
double val = GLPK.glp_mip_obj_val(lp);
System.out.println(name + " = " + val);
int n = GLPK.glp_get_num_cols(lp);
for (int i = 1; i <= n; i++) {
name = GLPK.glp_get_col_name(lp, i);
val = GLPK.glp_mip_col_val(lp, i);
System.out.println(name + " = " + val);
}
}
Still investigating the input part, though.

Calculate Dice Roll from Text Field

QUESTION:
How can I read the string "d6+2-d4" so that each d# will randomly generate a number within the parameter of the dice roll?
CLARIFIER:
I want to read a string and have it so when a d# appears, it will randomly generate a number such as to simulate a dice roll. Then, add up all the rolls and numbers to get a total. Much like how Roll20 does with their /roll command for an example. If !clarifying {lstThen.add("look at the Roll20 and play with the /roll command to understand it")} else if !understandStill {lstThen.add("I do not know what to say, someone else could try explaining it better...")}
Info:
I was making a Java program for Dungeons and Dragons, only to find that I have come across a problem in figuring out how to calculate the user input: I do not know how to evaluate a string such as this.
I theorize that I may need Java's eval at the end. I do know what I want to happen/have a theory on how to execute (this is more so PseudoCode than Java):
Random rand = new Random();
int i = 0;
String toEval;
String char;
String roll = txtField.getText();
while (i<roll.length) {
check if character at i position is a d, then highlight the numbers
after d until it comes to a special character/!aNumber
// so if d was found before 100, it will then highlight 100 and stop
// if the character is a symbol or the end of the string
if d appears {
char = rand.nextInt(#);
i + #'s of places;
// so when i++ occurs, it will move past whatever d# was in case
// d# was something like d100, d12, or d5291
} else {
char = roll.length[i];
}
toEval = toEval + char;
i++;
}
perform evaluation method on toEval to get a resulting number
list.add(roll + " = " + evaluated toEval);
EDIT:
With weston's help, I have honed in on what is likely needed, using a splitter with an array, it can detect certain symbols and add it into a list. However, it is my fault for not clarifying on what else was needed. The pseudocode above doesn't helpfully so this is what else I need to figure out.
roll.split("(+-/*^)");
As this part is what is also tripping me up. Should I make splits where there are numbers too? So an equation like:
String[] numbers = roll.split("(+-/*^)");
String[] symbols = roll.split("1234567890d")
// Rough idea for long way
loop statement {
loop to check for parentheses {
set operation to be done first
}
if symbol {
loop for symbol check {
perform operations
}}} // ending this since it looks like a bad way to do it...
// Better idea, originally thought up today (5/11/15)
int val[];
int re = 1;
loop {
if (list[i].containsIgnoreCase(d)) {
val[]=list[i].splitIgnoreCase("d");
list[i] = 0;
while (re <= val[0]) {
list[i] = list[i] + (rand.nextInt(val[1]) + 1);
re++;
}
}
}
// then create a string out of list[]/numbers[] and put together with
// symbols[] and use Java's evaluator for the String
wenton had it, it just seemed like it wasn't doing it for me (until I realised I wasn't specific on what I wanted) so basically to update, the string I want evaluated is (I know it's a little unorthodox, but it's to make a point; I also hope this clarifies even further of what is needed to make it work):
(3d12^d2-2)+d4(2*d4/d2)
From reading this, you may see the spots that I do not know how to perform very well... But that is why I am asking all you lovely, smart programmers out there! I hope I asked this clearly enough and thank you for your time :3

The trick with any programming problem is to break it up and write a method for each part, so below I have a method for rolling one dice, which is called by the one for rolling many.
private Random rand = new Random();
/**
* #param roll can be a multipart roll which is run and added up. e.g. d6+2-d4
*/
public int multiPartRoll(String roll) {
String[] parts = roll.split("(?=[+-])"); //split by +-, keeping them
int total = 0;
for (String partOfRoll : parts) { //roll each dice specified
total += singleRoll(partOfRoll);
}
return total;
}
/**
* #param roll can be fixed value, examples -1, +2, 15 or a dice to roll
* d6, +d20 -d100
*/
public int singleRoll(String roll) {
int di = roll.indexOf('d');
if (di == -1) //case where has no 'd'
return Integer.parseInt(roll);
int diceSize = Integer.parseInt(roll.substring(di + 1)); //value of string after 'd'
int result = rand.nextInt(diceSize) + 1; //roll the dice
if (roll.startsWith("-")) //negate if nessasary
result = -result;
return result;
}

adding and subtracting for BODMAS system

I have been building a simple formula calculator and have gotten stuck with addition and subtraction. As you should know, when calculating an equation, you follow the arithmetic rules of precedence, i.e. brackets, order: power functions, division, multiplication, addition and subtraction. The problem is that addition and subtraction are given equal priority, so therefore you can read it from left to right. Here is my code so far:
{
ArrayList<String> equation = java.util.Arrays.asList({"2","-","2","+","5"});
while(equation.contains("+")){
addMe(equation);
}
while(equation.contains("-")){
minusMe(equation);
}
}
public static void addMe(ArrayList<String> numberList){
for (int i = 0, n = numberList.size(); i < n; i++) {
String value = (String) numberList.get(i);
if(value.equals("+")){
String wordBefore = (String) numberList.get(i-1);
String wordAfter = (String) numberList.get(i+1);
System.out.println("This is the word before " + wordBefore);
System.out.println("This is the word after " + wordAfter);
double doubleFromBefore = Double.parseDouble(wordBefore);
double doubleFromAfter = Double.parseDouble(wordAfter);
double answer = doubleFromBefore + doubleFromAfter;
System.out.println("This is the answer: " + answer);
String stringAnswer = String.valueOf(answer);
String newNum2 = value.replace(value, stringAnswer);
numberList.set(i,newNum2);
numberList.remove(i-1);
numberList.remove(i);
break;
}
}
}
The minusMe method is exactly the same as the addMe method except with "-" in relevant places. The problem I am having is getting the equation read from left to right one item at a time and either doing the add or subtract method. Ideally I think I need to combine my 2 while loops with an iterator, to solve the problem but my attempts haven't worked. Any idea as to if this will solve my problem? If so please provide amended loop.
Regards

Have a look at this
java.uti.ArrayList<String> equation = java.util.Arrays.asList({"2","-","2","+","5"});
java.util.Iterator<String> equIterator = equation.iterator();
int result = 0;
int multiplier = 1;
while(equIterator.hasNext()){
String operandOrOperator = equIterator.next();
if(operandOrOperator.equals("+")){
multiplier=1;
}else if(operandOrOperator.equals("-")){
multiplier=-1;
}else if(operandOrOperator.equals("*")){
result*=Integer.parseInt(equIterator.next()); // Assuming that next element will be there always after operator.
}else{
result+=(multiplier * Integer.parseInt(operandOrOperator));
}
}
System.out.println("Final result : " + result);

You are doing this all wrong. You need to use at least a recursive-descent expression parser, or Dijkstra's shunting-yard algorithm, maybe even a parser generator if this is going to grow into some kind of a language. You will find all these things via a web search.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Weka Prediction margin using Java API - java

Related

Removing the last class attribute in test ARFF file for Weka ML Model not working in predicting model

Serialize list of complex numbers in python for Java

Input/output in GLPK for Java

Calculate Dice Roll from Text Field

adding and subtracting for BODMAS system

Categories

Resources