Weka J48 classification not following tree

Weka J48 classification not following tree - java

My original tree was much bigger, but since I'm stuck with this issue for quite some time I decided to try to simplify my tree. I Ended up with something like this:
As you can see, I only have a single attribute called "LarguraBandaRede" with 3 possible nominal values "Congestionado", "Livre" and "Merda".
After that I exported the j48.model from weka to use on my java code.
With this piece of code I import the model to use as a classifier:
ObjectInputStream objectInputStream = new ObjectInputStream(in);
classifier = (J48) objectInputStream.readObject();
After that I started to create a arraylist of my attributes and a Instances File
for (int i = 0; i <features.length; i++) {
String feature = features[i];
Attribute attribute;
if (feature.equals("TamanhoDados(Kb)")) {
attribute = new Attribute(feature);
} else {
String[] strings = null;
if(i==0) strings = populateAttributes(7);
if(i==1) strings = populateAttributes(10);
ArrayList<String> attValues = new ArrayList<String>(Arrays.asList(strings));
attribute = new Attribute(feature,attValues);
}
atts.add(attribute);
}
where populateAttributes gives the possible values for each attribute, in this case "Livre, Merda, Congestionado;" for LarguraBandaRede and "Sim,Nao" for Resultado, my class attribute.
Instances instances = new Instances("header",atts,atts.size());
instances.setClassIndex(instances.numAttributes()-1);
After creating my instances is time to create my Instance File, that is, the instances that I'm trying to classify
Instance instanceLivre = new DenseInstance(features.length);
Instance instanceMediano = new DenseInstance(features.length);
Instance instanceCongestionado = new DenseInstance(features.length);
instanceLivre.setDataset(instances);
instanceMediano.setDataset(instances);
instanceCongestionado.setDataset(instances);
then I set each of this instances with the 3 possible values for "LarguraBandaRede". 'instanceLivre' with "Livre", 'instanceMediano' with "Merda" and 'instanceCongestionado' with "Congestionado".
After that I only classify this 3 instances using the classifyInstance method
System.out.println(instance.toString());
double resp = classifier.classifyInstance(instance);
System.out.println("valor: "+resp);
and this is my result:
As you can see, the instance that has Merda as "LarguraBandaRede" was classify to be the same class as Congestionado, the class 'Nao'. But that doesn't make any sense, since the tree above clearly show that when "LarguraBandaRede" is "Merda" or "Livre" the class should be the same.
So that's my question. How this happened and how to fix it?
Thanks in advance.
EDIT
I didn't know that this:
made any difference in the way the model works. But we have to follow this order when feeding a nominal attribute with possible values.

Have you checked if the weka nominal attribute index is equal in order to your populateAttributes method?

Related

Weka remove attribute temporarily and later restore

In Weka (using Java), I would like to successsively fit classifiers to different subsets of attributes of the same dataset.
Is there a way to build the Instances object only once, and then remove the non-selected features but only temporarily, so they can be efficiently restored and used later in case the attribute is needed later to build another classifier, without having to create every time a totally new Instances object from scratch?
I am aware of method deleteAttributeAt() which says that
A deep copy of the attribute information is performed before the
attribute is deleted
and also of class Remove but I'm not sure this is what I need.

Create new Instances objects at each stage and use the appropriately.
For example, below example is using instances object without class and normalized to build a cluster.
Use rawData to get original instances. Hope this helps.
final SimpleKMeans kmeans = new SimpleKMeans();
final String[] options = weka.core.Utils
.splitOptions("-init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 10 -A \"weka.core.EuclideanDistance -R first-last\" -I 500 -num-slots 1 -S 50");
kmeans.setOptions(options);
kmeans.setSeed(1000);
kmeans.setPreserveInstancesOrder(true);
kmeans.setNumClusters(5);
kmeans.setMaxIterations(1000);
final BufferedReader datafile = readDataFile("/Users/data.arff");
final Instances rawData = new Instances(datafile);
rawData.setClassIndex(classIndex);
//remove class column[0] from cluster
final Remove removeFilter = new Remove();
removeFilter.setAttributeIndices("" + (rawData.classIndex() + 1));
removeFilter.setInputFormat(rawData);
final Instances dataNoClass = Filter.useFilter(rawData, removeFilter);
//normalize
final Normalize normalizeFilter = new Normalize();
normalizeFilter.setIgnoreClass(true);
normalizeFilter.setInputFormat(dataNoClass);
final Instances data = Filter.useFilter(dataNoClass, normalizeFilter);
kmeans.buildClusterer(data);

How to implement this property file?

I have this piece of data (this is just one part of one line of the whole file):
000000055555444444******4444 YY
I implemented this CSV config file to be able to read each part of the data and parse it:
128-12,140-22,YY
The first pair (128-12) represent at what position in the line to start reading and then the amount of characters to read, that first pair is for the account number.
The second pair if for the card number.
And the thir parameter is for the registry name.
Anyways, what I do is String.split(","), and then assign the array[0] as the account number and so on.
But I want to change that CSV config file to a Property file, but I'm not sure of how to implement that solution, if I use a Properties file I'd have to add a bunch of if/then in order to properly map my values, here's what I'm thinking of doing:
Property cfg = new Property();
cfg.put("fieldName", "accountNumber");
cfg.put("startPosition", "128");
cfg.put("length", "12");
But I'd have to say if("fieldName".equals("accountNumber")) then assign accountNumber; is there a way to implement this in such a way that I could avoid implementing all this decisions? right now with my solution I don't have to use ifs, I only say accountNumber = array[0]; and that's it, but I don't think that's a good solution and I think that using Property would be more elegant or efficient
EDIT:
This probably needs some more clarification, this data I'm showing is part of a parsing program that I'm currently doing for a client; the data holds information for many many of their customers and I have to parse a huge mess of data that I receive from them, into something more readable in order to convert it to a PDF file, so far the program is under production but I'm trying to refactor it a little bit. All the customer's information is saved into different Registry classes, each class having it's own set of fields with unique information, lets say that this is what RegistryYY would look like:
class RegistryYY extends Registry{
String name;
String lastName;
PhysicalAddress address;
public RegistryYY(String dataFromFile) {
}
}
I want to implement the Property solution, because in that way, I could make the Property for parsing the file, or interpreting the data correctly to be owned by each Registry class, I mean, a Registry should know what data it needs from the data received from the file right?, I think that if I do it that way, I could make each Registry an Observer and they would decide if the current line read from the file belongs to them by checking the registry name stored in the current line and then they'd return an initialized Registry to the calling object which only cares about receiving and storing a Registry class.
EDIT 2:
I created this function to return the value stored in each line's position:
public static String getField(String fieldParams, String rawData){
// splits the field
String[] fields = fieldParams.split("-");
int fieldStart = Integer.parseInt(fields[0]); // get initial position of the field
int fieldLen = Integer.parseInt(fields[1]); // get length of field
// gets field value
String fieldValue = FieldParser.getStringValue(rawData, fieldStart, fieldLen);
return fieldValue;
}
Which works with the CSV file, I'd like to change the implementation to work with the Property file instead.

Is there any reason why you need to have the record layout exposed to the outside world ? does it need to be configurable ?
I think your proposed approached of using the Property file is better than your current approach of using the CSV file since it is more descriptive and meaningful. I would just add a "type" attribute to your Property definition as well to enforce your conversion i.e. for Numeric/String/Date/Boolean.
I wouldnt use an "if" statement to process your property file. You can load all the properties into an Array at the beginning and then iterate around the array for each line of your data file and process that section accordingly something like pseudo code below,
for each loop of data-file{
SomeClass myClass = myClassBuilder(data-file-line)
}
myClassBuilder SomeClass (String data-file-line){
Map<column, value> result = new HashMap<>
for each attribute of property-file-list{
switch attribute_type {
Integer:
result.put(fieldname, makeInteger(data-file-line, property_attribute)
Date:
result.put(fieldname, makeDate(data-file-line, property_attribute)
Boolean :
result.put(fieldname, makeBoolean(data-file-line, property_attribute)
String :
result.put(fieldname, makeBoolean(data-file-line, property_attribute)
------- etc
}
}
return new SomeClass(result)
}
}
If your record layout doesnt need to be configurable then you could do all the conversion inside your Java application only and not even use a Property file.
If you could get your data in XML format then you could use the JAXB framework and simply have your data definition in an XML file.

First of all, thanks to the guys who helped me, #robbie70, #RC. and #VinceEmigh.
I used YAML to parse a file called "test.yml" with the following information in it:
statement:
- fieldName: accountNumber
startPosition: 128
length: 12
- fieldName: cardNumber
startPosition: 140
length: 22
- fieldName: registryName
startPosition: 162
length: 2
This is what I made:
// Start of main
String fileValue = "0222000000002222F 00000000000111110001000000099999444444******4444 YY";
YamlReader reader = new YamlReader(new FileReader("test.yml"));
Object object = reader.read();
System.out.println(object);
Map map = (Map) object;
List list = (List) map.get("statement");
for(int i = 0; i < list.size(); i++) {
Map map2 = (Map) list.get(i);
System.out.println("Value: " + foo(map2, fileValue));
}
}
// End of main
public static String foo(Map map, String source) {
int startPos = Integer.parseInt((String) map.get("startPosition"));
int length = Integer.parseInt((String) map.get("length"));
return getField(startPos, length, source);
}
public static String getField(int start, int length, String source) {
return source.substring(start, start+length);
}
It correctly displays the output:
Value: 000000099999
Value: 444444******4444
Value: YY
I know that maybe the config file has some lists and other unnecessary values and what nots, and that maybe the program needs a little improvement, but I think that I can take it from here and implement what I had in mind.
EDIT:
I made this other one, using Apache Commons, this is what I have in the configuration property file:
#properties defining the statement file
#properties for account number
statement.accountNumber.startPosition = 128
statement.accountNumber.length = 12
statement.account.rules = ${statement.accountNumber.startPosition} ${statement.accountNumber.length}
#properties for card number
statement.cardNumber.startPosition = 140
statement.cardNumber.length = 22
statement.card.rules = ${statement.cardNumber.startPosition} ${statement.cardNumber.length}
#properties for registry name
statement.registryName.startPosition = 162
statement.registryName.length = 2
statement.registry.rules = ${statement.registryName.startPosition} ${statement.registryName.length}
And this is how I read it:
// Inside Main
String valorLeido = "0713000000007451D 00000000000111110001000000099999444444******4444 YY";
Parameters params = new Parameters();
FileBasedConfigurationBuilder<FileBasedConfiguration> builder =
new FileBasedConfigurationBuilder<FileBasedConfiguration>(PropertiesConfiguration.class)
.configure(params.properties()
.setFileName("config.properties"));
try {
Configuration config = builder.getConfiguration();
Iterator<String> keys = config.getKeys();
String account = getValue(getRules(config, "statement.account.rules"), valorLeido);
String cardNumber = getValue(getRules(config, "statement.card.rules"), valorLeido);
String registryName = getValue(getRules(config, "statement.registry.rules"), valorLeido);
} catch (org.apache.commons.configuration2.ex.ConfigurationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// End of Main
public static String getRules(Configuration config, String rules) {
return config.getString(rules);
}
public static String getValue(String rules, String source) {
String[] tokens = rules.split(" ");
int startPos = Integer.parseInt(tokens[0]);
int length = Integer.parseInt(tokens[1]);
return getField(startPos, length, source);
}
I'm not entirely sure, I think that with the YAML file it looks simpler, but I really like the control I get with the Apache Commons Config, since I can pass around the Configuration object to each registry, and the registry knows what "rules" it wants to get, let's say that the Registry class only cares about "statement.registry.rules" and that's it, with the YAML option I'm not entirely sure of how to do that yet, maybe I'll need to experiment with both options a little bit more, but I like where this is going.
PS:
That weird value I used in fileValue is what I'm dealing with, now add nearly 1,000 characters to the length of the line and you'll understand why I want to have a config file for parsing it (don't ask why....clients be crazy)

JHDF5 - How to avoid dataset being overwritten

I am using JHDF5 to log a collection of values to a hdf5 file. I am currently using two ArrayLists to do this, one with the values and one with the names of the values.
ArrayList<String> valueList = new ArrayList<String>();
ArrayList<String> nameList = new ArrayList<String>();
valueList.add("Value1");
valueList.add("Value2");
nameList.add("Name1");
nameList.add("Name2");
IHDF5Writer writer = HDF5Factory.configure("My_Log").keepDataSetsIfTheyExist().writer();
HDF5CompoundType<List<?>> type = writer.compound().getInferredType("", nameList, valueList);
writer.compound().write("log1", type, valueList);
writer.close();
This will log the values in the correct way to the file My_Log and in the dataset "log1". However, this example always overwrites the previous log of the values in the dataset "log1". I want to be able to log to the same dataset everytime, adding the latest log to the next line/index of the dataset. For example, if I were to change the value of "Name2" to "Value3" and log the values, and then change "Name1" to "Value4" and "Name2" to "Value5" and log the values, the dataset should look like this:
I thought the keepDataSetsIfTheyExist() option to would prevent the dataset to be overwritten, but apparently it doesn't work that way.
Something similar to what I want can be achieved in some cases with writer.compound().writeArrayBlock(), and specify by what index the array block shall be written. However, this solution doesn't seem to be compatible with my current code, where I have to use lists for handling my data.
Is there some option to achieve this that I have overlooked, or can't this be done with JHDF5?

I don't think that will work. It is not quite clear to me, but I believe the getInferredType() you are using is creating a data set with 2 name -> value entries. So it is effectively creating an object inside the hdf5. The best solution I could come up with was to read the previous values add them to the valueList before outputting:
ArrayList<String> valueList = new ArrayList<>();
valueList.add("Value1");
valueList.add("Value2");
try (IHDF5Reader reader = HDF5Factory.configure("My_Log.h5").reader()) {
String[] previous = reader.string().readArray("log1");
for (int i = 0; i < previous.length; i++) {
valueList.add(i, previous[i]);
}
} catch (HDF5FileNotFoundException ex) {
// Nothing to do here.
}
MDArray<String> values = new MDArray<>(String.class, new long[]{valueList.size()});
for (int i = 0; i < valueList.size(); i++) {
values.set(valueList.get(i), i);
}
try (IHDF5Writer writer = HDF5Factory.configure("My_Log.h5").writer()) {
writer.string().writeMDArray("log1", values);
}
If you call this code a second time with "Value3" and "Value4" instead, you will get 4 values. This sort of solution might become unpleasant if you start to have hierarchies of datasets however.

To solve your issue, you need to define the dataset log1 as extendible so that it can store an unknown number of log entries (that are generated over time) and write these using a point or hyperslab selection (otherwise, the dataset will be overwritten).
If you are not bound to a specific technology to handle HDF5 files, you may wish to give a look at HDFql which is an high-level language to manage HDF5 files easily. A possible solution for your use-case using HDFql (in Java) is:
public class Example
{
public Class Log
{
String name1;
String name2;
}
public boolean doSomething(Log log)
{
log.name1 = "Value1";
log.name2 = "Value2";
return true;
}
public static void main(String args[])
{
// declare variables
Log log = new Log();
int variableNumber;
// create an HDF5 file named 'My_Log.h5' and use (i.e. open) it
HDFql.execute("CREATE AND USE FILE My_Log.h5");
// create an extendible HDF5 dataset named 'log1' of data type compound
HDFql.execute("CREATE DATASET log1 AS COMPOUND(name1 AS VARCHAR, name2 AS VARCHAR)(0 TO UNLIMITED)");
// register variable 'log' for subsequent usage (by HDFql)
variableNumber = HDFql.variableRegister(log);
// call function 'doSomething' that does something and populates variable 'log' with an entry
while(doSomething(log))
{
// alter (i.e. extend) dataset 'log1' to +1 (i.e. add a new row)
HDFql.execute("ALTER DIMENSION log1 TO +1");
// insert (i.e. write) data stored in variable 'log' into dataset 'log1' using a point selection
HDFql.execute("INSERT INTO log1(-1) VALUES FROM MEMORY " + variableNumber);
}
}
}

text classifier with weka: how to correctly train a classifier issue

I'm trying to build a text classifier using Weka, but the probabilities with distributionForInstance of the classes are 1.0 in one and 0.0 in all other cases, so classifyInstance always returns the same class as prediction. Something in the training doesn't work correctly.
ARFF training
#relation test1
#attribute tweetmsg String
#attribute classValues {politica,sport,musicatvcinema,infogeneriche,fattidelgiorno,statopersonale,checkin,conversazione}
#DATA
"Renzi Berlusconi Salvini Bersani",politica
"Allegri insulta la terna arbitrale",sport
"Bravo Garcia",sport
Training methods
public void trainClassifier(final String INPUT_FILENAME) throws Exception
{
getTrainingDataset(INPUT_FILENAME);
//trainingInstances consists of feature vector of every input
for(Instance currentInstance : inputDataset)
{
Instance currentFeatureVector = extractFeature(currentInstance);
currentFeatureVector.setDataset(trainingInstances);
trainingInstances.add(currentFeatureVector);
}
classifier = new NaiveBayes();
try {
//classifier training code
classifier.buildClassifier(trainingInstances);
//storing the trained classifier to a file for future use
weka.core.SerializationHelper.write("NaiveBayes.model",classifier);
} catch (Exception ex) {
System.out.println("Exception in training the classifier."+ex);
}
}
private Instance extractFeature(Instance inputInstance) throws Exception
{
String tweet = inputInstance.stringValue(0);
StringTokenizer defaultTokenizer = new StringTokenizer(tweet);
List<String> tokens=new ArrayList<String>();
while (defaultTokenizer.hasMoreTokens())
{
String t= defaultTokenizer.nextToken();
tokens.add(t);
}
Iterator<String> a = tokens.iterator();
while(a.hasNext())
{
String token=(String) a.next();
String word = token.replaceAll("#","");
if(featureWords.contains(word))
{
double cont=featureMap.get(featureWords.indexOf(word))+1;
featureMap.put(featureWords.indexOf(word),cont);
}
else{
featureWords.add(word);
featureMap.put(featureWords.indexOf(word), 1.0);
}
}
attributeList.clear();
for(String featureWord : featureWords)
{
attributeList.add(new Attribute(featureWord));
}
attributeList.add(new Attribute("Class", classValues));
int indices[] = new int[featureMap.size()+1];
double values[] = new double[featureMap.size()+1];
int i=0;
for(Map.Entry<Integer,Double> entry : featureMap.entrySet())
{
indices[i] = entry.getKey();
values[i] = entry.getValue();
i++;
}
indices[i] = featureWords.size();
values[i] = (double)classValues.indexOf(inputInstance.stringValue(1));
trainingInstances = createInstances("TRAINING_INSTANCES");
return new SparseInstance(1.0,values,indices,1000000);
}
private void getTrainingDataset(final String INPUT_FILENAME)
{
try{
ArffLoader trainingLoader = new ArffLoader();
trainingLoader.setSource(new File(INPUT_FILENAME));
inputDataset = trainingLoader.getDataSet();
}catch(IOException ex)
{
System.out.println("Exception in getTrainingDataset Method");
}
System.out.println("dataset "+inputDataset.numAttributes());
}
private Instances createInstances(final String INSTANCES_NAME)
{
//create an Instances object with initial capacity as zero
Instances instances = new Instances(INSTANCES_NAME,attributeList,0);
//sets the class index as the last attribute
instances.setClassIndex(instances.numAttributes()-1);
return instances;
}
public static void main(String[] args) throws Exception
{
Classificatore wekaTutorial = new Classificatore();
wekaTutorial.trainClassifier("training_set_prova_tent.arff");
wekaTutorial.testClassifier("testing.arff");
}
public Classificatore()
{
attributeList = new ArrayList<Attribute>();
initialize();
}
private void initialize()
{
featureWords= new ArrayList<String>();
featureMap = new TreeMap<>();
classValues= new ArrayList<String>();
classValues.add("politica");
classValues.add("sport");
classValues.add("musicatvcinema");
classValues.add("infogeneriche");
classValues.add("fattidelgiorno");
classValues.add("statopersonale");
classValues.add("checkin");
classValues.add("conversazione");
}
TESTING METHODS
public void testClassifier(final String INPUT_FILENAME) throws Exception
{
getTrainingDataset(INPUT_FILENAME);
//trainingInstances consists of feature vector of every input
Instances testingInstances = createInstances("TESTING_INSTANCES");
for(Instance currentInstance : inputDataset)
{
//extractFeature method returns the feature vector for the current input
Instance currentFeatureVector = extractFeature(currentInstance);
//Make the currentFeatureVector to be added to the trainingInstances
currentFeatureVector.setDataset(testingInstances);
testingInstances.add(currentFeatureVector);
}
try {
//Classifier deserialization
classifier = (Classifier) weka.core.SerializationHelper.read("NaiveBayes.model");
//classifier testing code
for(Instance testInstance : testingInstances)
{
double score = classifier.classifyInstance(testInstance);
double[] vv= classifier.distributionForInstance(testInstance);
for(int k=0;k<vv.length;k++){
System.out.println("distribution "+vv[k]); //this are the probabilities of the classes and as result i get 1.0 in one and 0.0 in all the others
}
System.out.println(testingInstances.attribute("Class").value((int)score));
}
} catch (Exception ex) {
System.out.println("Exception in testing the classifier."+ex);
}
}
I want to create a text classifier for short messages, this code is based on this tutorial http://preciselyconcise.com/apis_and_installations/training_a_weka_classifier_in_java.php . The problem is that the classifier predict the wrong class for almost every message in the testing.arff because the probabilities of the classes are not correct. The training_set_prova_tent.arff has the same number of messages per class.
The example i'm following use a featureWords.dat and associate 1.0 to the word if it is present in a message instead I want to create my own dictionary with the words present in the training_set_prova_tent plus the words present in testing and associate to every word the number of occurrences .
P.S
I know that this is exactly what can i do with the filter StringToWordVector but I haven't found any example that exaplain how to use this filter with two file: one for the training set and one for the test set. So it seems easier to adapt the code I found.
Thank you very much

It seems like you changed the code from the website you referenced in some crucial points, but not in a good way. I'll try to draft what you're trying to do and what mistakes I've found.
What you (probably) wanted to do in extractFeature is
Split each tweet into words (tokenize)
Count the number of occurrences of these words
Create a feature vector representing these word counts plus the class
What you've overlooked in that method is
You never reset your featureMap. The line
Map<Integer,Double> featureMap = new TreeMap<>();
originally was at the beginning extractFeatures, but you moved it to initialize. That means that you always add up the word counts, but never reset them. For each new tweet, your word count also includes the word count of all previous tweets. I'm sure that is not what you wanted.
You don't initialize featureWords with the words you want as features. Yes, you create an empty list, but you fill it iteratively with each tweet. The original code initialized it once in the initialize method and it never changed after that. There are two problems with that:
With each new tweet, new features (words) get added, so your feature vector grows with each tweet. That wouldn't be such a big problem (SparseInstance), but that means that
Your class attribute is always in another place. These two lines work for the original code, because featureWords.size() is basically a constant, but in your code the class label will be at index 5, then 8, then 12, and so on, but it must be the same for every instance.
indices[i] = featureWords.size();
values[i] = (double) classValues.indexOf(inputInstance.stringValue(1));
This also manifests itself in the fact that you build a new attributeList with each new tweet, instead of only once in initialize, which is bad for already explained reasons.
There may be more stuff, but - as it is - your code is rather unfixable. What you want is much closer to the tutorial source code which you modified than your version.
Also, you should look into StringToWordVector because it seems like this is exactly what you want to do:
Converts String attributes into a set of attributes representing word occurrence (depending on the tokenizer) information from the text contained in the strings. The set of words (attributes) is determined by the first batch filtered (typically training data).

ArrayIndexOutOfBoundsException in weka.classifiers.Classifier.classifyInstance

I have written this method. I want to write a Bayesian Network, but I get an exception on the classifyInstance() method.
Here is my code:
public static double bayesNet1(Dataset data, Dataset testingSet) throws Exception {
Instances insts = convertTxtToARFF(data);
K2 learner = new K2();
MultiNomialBMAEstimator estimator = new MultiNomialBMAEstimator();
estimator.setUseK2Prior(true);
EditableBayesNet bn = new EditableBayesNet(insts);
bn.initStructure();
learner.buildStructure(bn, insts);
estimator.estimateCPTs(bn);
double error = 0;
Instances instsTest = convertTxtToARFF(testingSet);
for(int i=0; i<instsTest.numInstances()-1; i++) {
weka.core.Instance inst = instsTest.instance(i);
double predictedValue = bn.classifyInstance(inst);
if(inst.value(inst.classIndex())!= predictedValue)
error++;
}
return error/instsTest.numInstances();
}
And here is the exception:
java.lang.ArrayIndexOutOfBoundsException: 4 at
weka.classifiers.bayes.net.estimate.DiscreteEstimatorBayes.getProbability(DiscreteEstimatorBayes.java:106)
at
weka.classifiers.bayes.net.estimate.SimpleEstimator.distributionForInstance(SimpleEstimator.java:183)
at
weka.classifiers.bayes.BayesNet.distributionForInstance(BayesNet.java:386)
at weka.classifiers.Classifier.classifyInstance(Classifier.java:84)
at
ensembleClassifiersV2.EnsembleClassifierV2.bayesNet1(EnsembleClassifierV2.java:1090)
at
ensembleClassifiersV2.EnsembleClassifierV2.performing(EnsembleClassifierV2.java:800)
at
ensembleClassifiersV2.EnsembleClassifierV2.main(EnsembleClassifierV2.java:1267)
Can anyone help me what is wrong?

I have the same problem. My fault is I had not set the class for the test data. As simple as that.

I find that this error commonly occurs in the distributionForInstance() method for many of the different classifiers when you are dealing with nominal attributes.
If this is the case, it could be that the test data has a nominal attribute with an attribute value that the train data lacks.
In this case, it really depends upon what the best decision is for what you are doing. Perhaps checking the data itself for consistency is the first step and then you go from there.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Weka J48 classification not following tree - java

Have you checked if the weka nominal attribute index is equal in order to your populateAttributes method?

Related

Weka remove attribute temporarily and later restore

How to implement this property file?

JHDF5 - How to avoid dataset being overwritten

text classifier with weka: how to correctly train a classifier issue

ArrayIndexOutOfBoundsException in weka.classifiers.Classifier.classifyInstance

Categories

Resources