I train and create a J48 model use WEKA Java Api.
Then, I use classifyInstance() to classify my instance.
but the result is wrong.
my code id following:
Instances train = reader.getDataSet();
Instances test = reader_test.getDataSet();
train.setClassIndex(train.numAttributes() - 1);
Classifier cls = new J48();
cls.buildClassifier(train);
test.setClassIndex(test.numAttributes() - 1);
for(int i = 0; i < test.numInstances(); i++){
Instance inst = test.instance(i);
double result = cls.classifyInstance(inst);
System.out.println(train.classAttribute().value((int)r));
}
The result always equal 0.0
Finally, I use test.insertAttributeAt() before test.setClassIndex().
as following:
test.insertAttributeAt(train.attribute(train.numAttributes() - 1), test.numAttributes());
The result become right. I am very surprising!
however, most documents are not use the function to inserAttribute.
I want to understand why the result become right suddenly.
It will help you.
BufferedReader datafile = readDataFile(TrainingFile);
Instances train = new Instances(datafile);
data.setClassIndex(data.numAttributes() - 1);
Classifier cls = new J48();
cls.buildClassifier(train);
DataSource testDataset = new DataSource(Test);
Instances test = testDataset.getDataSet();
Testdata.setClassIndex(Testdata.numAttributes() - 1);
for(int i = 0; i < test.numInstances(); i++){
Instance inst = test.instance(i);
double actualClassValue = test.instance(i).classValue();
//it will print your class value
String actual=test.classAttribute().value((int)actualClassValue);
double result = cls.classifyInstance(inst);
//will print your predicted value
String prediction=test.classAttribute().value((int)result );
}
you don't need to use insertAttributeAt now.
File Conversion Code
// load CSV
CSVLoader loader = new CSVLoader();
String InputFilename = "TrainingFileName";
loader.setSource(new File(InputFilename));
Instances data = loader.getDataSet();
// save ARFF
ArffSaver saver = new ArffSaver();
saver.setInstances(data);
String FileT = Filename+".arff";
saver.setFile(new File(Path+Directory+"\\"+FileT));
saver.writeBatch();
Change accordingly.
Thanks
Related
I'm just a lucene starter and and i got stuck on a problem during a change from a RAMDIrectory to a FSDirectory:
First my code:
private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
new StandardAnalyzer(Version.LUCENE_43));
Directory DIR = FSDirectory.open(new File(INDEXLOC)); //INDEXLOC = "path/to/dir/"
// RAMDirectory DIR = new RAMDirectory();
// Index some made up content
IndexWriter writer =
new IndexWriter(DIR, iwc);
// Store both position and offset information
FieldType type = new FieldType();
type.setStored(true);
type.setStoreTermVectors(true);
type.setStoreTermVectorOffsets(true);
type.setStoreTermVectorPositions(true);
type.setIndexed(true);
type.setTokenized(true);
IDocumentParser p = DocumentParserFactory.getParser(f);
ArrayList<ParserDocument> DOCS = p.getParsedDocuments();
for (int i = 0; i < DOCS.size(); i++) {
Document doc = new Document();
Field id = new StringField("id", "doc_" + i, Field.Store.YES);
doc.add(id);
Field text = new Field("content", DOCS.get(i).getContent(), type);
doc.add(text);
writer.addDocument(doc);
}
writer.close();
// Get a searcher
IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(DIR));
// Do a search using SpanQuery
SpanTermQuery fleeceQ = new SpanTermQuery(new Term("content", "zahl"));
TopDocs results = searcher.search(fleeceQ, 10);
for (int i = 0; i < results.scoreDocs.length; i++) {
ScoreDoc scoreDoc = results.scoreDocs[i];
System.out.println("Score Doc: " + scoreDoc);
}
IndexReader reader = searcher.getIndexReader();
AtomicReader wrapper = SlowCompositeReaderWrapper.wrap(reader);
Map<Term, TermContext> termContexts = new HashMap<Term, TermContext>();
Spans spans = fleeceQ.getSpans(wrapper.getContext(), new Bits.MatchAllBits(reader.numDocs()), termContexts);
int window = 2;// get the words within two of the match
while (spans.next() == true) {
Map<Integer, String> entries = new TreeMap<Integer, String>();
System.out.println("Doc: " + spans.doc() + " Start: " + spans.start() + " End: " + spans.end());
int start = spans.start() - window;
int end = spans.end() + window;
Terms content = reader.getTermVector(spans.doc(), "content");
TermsEnum termsEnum = content.iterator(null);
BytesRef term;
while ((term = termsEnum.next()) != null) {
// could store the BytesRef here, but String is easier for this
// example
String s = new String(term.bytes, term.offset, term.length);
DocsAndPositionsEnum positionsEnum = termsEnum.docsAndPositions(null, null);
if (positionsEnum.nextDoc() != DocIdSetIterator.NO_MORE_DOCS) {
int i = 0;
int position = -1;
while (i < positionsEnum.freq() && (position = positionsEnum.nextPosition()) != -1) {
if (position >= start && position <= end) {
entries.put(position, s);
}
i++;
}
}
}
System.out.println("Entries:" + entries);
}
it's just some code i found on a great website and i wanted to try .... everything works great using the RAMDirectory. But if i change it to my FSDirectory it's giving me a NullpointerException like :
Exception in thread "main" java.lang.NullPointerException at
com.org.test.TextDB.myMethod(TextDB.java:184) at
com.org.test.Main.main(Main.java:31)
The statement Terms content = reader.getTermVector(spans.doc(), "content"); seems to get no result and returns null. so the exception. but why? in my ramDIR everything works fine.
It seems that the indexWriter or the Reader (really don't know) didn't write or didn't read the field "content" properly from the index. But i really don't know why its 'written' in a RAMDirectory and not written in a FSDIrectory?!
Anybody an idea to that?
Gave this a test a quick test run, and I can't reproduce your issue.
I think the most likely issue here is old documents in your index. The way this is written, every time it is run, more documents will be added to your index. Old documents from previous runs won't get deleted, or overwritten, they'll just stick around. So, if you have run this before on the same directory, say perhaps, before you added the line type.setStoreTermVectors(true);, some of your results may be these old documents with term vectors, and reader.getTermVector(...) will return null, if the document does not store term vectors.
Of course, anything indexed in a RAMDirectory will be dropped as soon as execution finishes, so the issue would not occur in that case.
Simple solution would be to try deleting the index directory and run it again.
If you want to start with a fresh index when you run this, you can set that up through the IndexWriterConfig:
private static IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_43,
new StandardAnalyzer(Version.LUCENE_43));
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
That's a guess, of course, but seems consistent with the behavior you've described.
I am trying to classify an instance in java using the weka library and the tutorials online.
I have built a model in my device and loaded that model from the disk using this code.
public void makeModel() throws Exception
{
ArffLoader loader = new ArffLoader();
loader.setFile(new File("data.arff"));
Instances structure = loader.getDataSet();
structure.setClassIndex(1);
// train NaiveBayes
NaiveBayesMultinomial n = new NaiveBayesMultinomial();
FilteredClassifier f = new FilteredClassifier();
StringToWordVector s = new StringToWordVector();
s.setUseStoplist(true);
s.setWordsToKeep(100);
f.setFilter(s);
f.setClassifier(n);
structure.numAttributes();
f.buildClassifier(structure);
Instance current;
Evaluation eval = new Evaluation(structure);
eval.crossValidateModel(f, structure, 10, new Random(1));
System.out.println(eval.toSummaryString("\nResults\n======\n", false));
// output generated model
//System.out.println(f);
ObjectOutputStream oos = new ObjectOutputStream(
new FileOutputStream("classifier.model"));
oos.writeObject(f);
oos.flush();
oos.close();
}
------------------------ Output-------------
Results
Correctly Classified Instances 20158 79.6948 %
Incorrectly Classified Instances 5136 20.3052 %
Kappa statistic 0.6737
Mean absolute error 0.0726
Root mean squared error 0.2025
Relative absolute error 38.7564 %
Root relative squared error 66.1815 %
Coverage of cases (0.95 level) 96.4142 %
Mean rel. region size (0.95 level) 27.7531 %
Total Number of Instances 25294
Then i used the same model to classify an unlabelled instance.
public void classify() throws Exception
{
FilteredClassifier cls = (FilteredClassifier) weka.core.SerializationHelper.read("classifier.model");
Instances unlabeled = new Instances(
new BufferedReader(
new FileReader("test.arff")));
// set class attribute
unlabeled.setClassIndex(0);
// create copy
Instances labeled = new Instances(unlabeled);
// label instances
for (int i = 0; i < unlabeled.numInstances(); i++) {
System.out.println(labeled.instance(i).classValue());
System.out.print(", actual: " + labeled.classAttribute().value((int)labeled.instance(i).classValue()));
double clsLabel = cls.classifyInstance(unlabeled.instance(i));
labeled.instance(i).setClassValue(clsLabel);
System.out.println(", predicted: " + labeled.classAttribute().value((int) clsLabel));
}
// save labeled data
System.out.println("ended");
}
------------------------ Output---------------------------
1.0
, actual: Bud1? is a This is a new new string.txtIlocblobR(?????? #? #? #? #E?DSDB ` #? #? #, predicted: *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES*
2.0
, actual: This is a new new string
, predicted: *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES*
ended
However, my error is that the predicted is actually *WEKA*DUMMY*STRING*FOR*STRING*ATTRIBUTES* when it should have given me a class label instead.
While saving classifier also save Instances (just header, no data required):
Instances instancesSample = new Instances(structure, 0);
instancesSample.setClassIndex(1);
...
ObjectOutputStream oos = new ObjectOutputStream(
new FileOutputStream("classifier.model"));
oos.writeObject(f);
oos.writeObject(instancesSample);
oos.flush();
oos.close();
After model is loaded, load saved Instances as instancesSample.
While classifying:
ObjectInputStream objectInputStream = new ObjectInputStream(new BufferedInputStream(new FileInputStream("classifier.model")));
FilteredClassifier cls = (FilteredClassifier)= (Classifier) objectInputStream.readObject();
Instances instancesSample = (Instances) objectInputStream.readObject();
objectInputStream.close();
int classIndex = 1;
Instances ins = unlabeled[i];
double clsLabel = cls.classifyInstance(ins);
String prediction = instancesSample.attribute(classIndex).value((int) clsLabel));
System.out.println(", predicted: " + prediction);
I have added these lines to my classify method.
ArffLoader loader = new ArffLoader();
loader.setFile(new File("data.arff"));
Instances structure = loader.getDataSet();
structure.setClassIndex(1);
To obtain the class label i changed it to this
System.out.println(", predicted: " + structure.classAttribute().value((int) clsLabel));
I am using a string to vector filter to convert my arff to vector format.
But it throws an exception
weka.core.WekaException: weka.classifiers.bayes.NaiveBayesMultinomialUpdateable: Not enough training instances with class labels (required: 1, provided: 0)!
I tried to use the same on weka explorer and it worked fine.
This is my code
ArffLoader loader = new ArffLoader();
loader.setFile(new File("valid file"));
Instances structure = loader.getStructure();
structure.setClassIndex(0);
// train NaiveBayes
NaiveBayesMultinomialUpdateable n = new NaiveBayesMultinomialUpdateable();
FilteredClassifier f = new FilteredClassifier();
StringToWordVector s = new StringToWordVector();
f.setFilter(s);
f.setClassifier(n);
f.buildClassifier(structure);
Instance current;
while ((current = loader.getNextInstance(structure)) != null)
n.updateClassifier(current);
// output generated model
System.out.println(n);
I have tried another example but it still does not work
ArffLoader loader = new ArffLoader();
loader.setFile(new File("valid file"));
Instances structure = loader.getStructure();
// train NaiveBayes
NaiveBayesMultinomialUpdateable n = new NaiveBayesMultinomialUpdateable();
FilteredClassifier f = new FilteredClassifier();
StringToWordVector s = new StringToWordVector();
s.setInputFormat(structure);
Instances struct = Filter.useFilter(structure, s);
struct.setClassIndex(0);
System.out.println(struct.numAttributes()); // only gives 2 or 1 attributes
n.buildClassifier(struct);
Instance current;
while ((current = loader.getNextInstance(struct)) != null)
n.updateClassifier(current);
// output generated model
System.out.println(n);
The number of attributes printed is always 2 or 1.
It seems the string to word vector isn't working as expected
Original folder : https://www.dropbox.com/sh/cma4hbe2r96ul1c/GL2wNdeVUz
Converted to arff: https://www.dropbox.com/s/efle6ci4lb5riq7/test1.arff
According to your arff, the class seems to be the second in the two attributes, so the problem can be here:
struct.setClassIndex(0);
try
struct.setClassIndex(1);
UPDATE: I made this change to the first example, and it gives no exception, and prints out:
The independent probability of a class
--------------------------------------
oil spill 40.0
police 989.0
The probability of a word given the class
-----------------------------------------
oil spill police
class Infinity Infinity
I trained an IBK classifier with some training data that I created manually as following:
ArrayList<Attribute> atts = new ArrayList<Attribute>();
ArrayList<String> classVal = new ArrayList<String>();
classVal.add("C1");
classVal.add("C2");
atts.add(new Attribute("a"));
atts.add(new Attribute("b"));
atts.add(new Attribute("c"));
atts.add(new Attribute("d"));
atts.add(new Attribute("##class##", classVal));
Instances dataRaw = new Instances("TestInstances", atts, 0);
dataRaw.setClassIndex(dataRaw.numAttributes() - 1);
double[] instanceValue1 = new double[]{3,0,1,0,0};
dataRaw.add(new DenseInstance(1.0, instanceValue1));
double[] instanceValue2 = new double[]{2,1,1,0,0};
dataRaw.add(new DenseInstance(1.0, instanceValue2));
double[] instanceValue3 = new double[]{2,0,2,0,0};
dataRaw.add(new DenseInstance(1.0, instanceValue3));
double[] instanceValue4 = new double[]{1,3,0,0,1};
dataRaw.add(new DenseInstance(1.0, instanceValue4));
double[] instanceValue5 = new double[]{0,3,1,0,1};
dataRaw.add(new DenseInstance(1.0, instanceValue5));
double[] instanceValue6 = new double[]{0,2,1,1,1};
dataRaw.add(new DenseInstance(1.0, instanceValue6));
Then I build up the classifier:
IBk ibk = new IBk();
try {
ibk.buildClassifier(dataRaw);
} catch (Exception e) {
e.printStackTrace();
}
I want to create a new instance with unlabeled class and classify this instance, I tried the following with no luck.
IBk ibk = new IBk();
try {
ibk.buildClassifier(dataRaw);
double[] values = new double[]{3,1,0,0,-1};
DenseInstance newInst = new DenseInstance(1.0,values);
double classif = ibk.classifyInstance(newInst);
System.out.println(classif);
} catch (Exception e) {
e.printStackTrace();
}
I just get the following errors
weka.core.UnassignedDatasetException: DenseInstance doesn't have access to a dataset!
at weka.core.AbstractInstance.classAttribute(AbstractInstance.java:98)
at weka.classifiers.AbstractClassifier.classifyInstance(AbstractClassifier.java:74)
at TextCategorizationTest.instancesWithDoubleValues(TextCategorizationTest.java:136)
at TextCategorizationTest.main(TextCategorizationTest.java:33)
Looks like I am doing something wrong while creating a new instance. How can I create an unlabeled instance exactly ?
Thanks in Advance
You will see this error, when you classify a new instance which is not associated with a dataset. You have to associate every new instance you create to an Instances object using setDataset.
//Make a place holder Instances
//If you already have access to one, you can skip this step
Instances dataset = new Instances("testdata", attr, 1);
dataset.setClassIndex(classIdx);
DenseInstance newInst = new DenseInstance(1.0,values);
//To associate your instance with Instances object, in this case dataset
newInst.setDataset(dataset);
After this you can classify newly created instance.
double classif = ibk.classifyInstance(newInst);
http://www.cs.tufts.edu/~ablumer/weka/doc/weka.core.Instance.html
Detailed Implementation Link
The problem is with this line:
double classif = ibk.classifyInstance(newInst);
When you try to classify newInst, Weka throws an exception because newInst has no Instances object (i.e., dataset) associated with it - thus it does not know anything about its class attribute.
You should first create a new Instances object similar to the dataRaw, add your unlabeled instance to it, set class index, and only then try classifying it, e.g.:
Instances dataUnlabeled = new Instances("TestInstances", atts, 0);
dataUnlabeled.add(newInst);
dataUnlabeled.setClassIndex(dataUnlabeled.numAttributes() - 1);
double classif = ibk.classifyInstance(dataUnlabeled.firstInstance());
See pages 203 - 204 of the WEKA documentation. That helped me a lot! (The Weka Manual is a pdf file that is located in your weka installation folder. Just open the doucmentation.html and it will point you to the pdf manual.)
Copy-pasting some snippets of the code listings of Chapter 17 (Using the WEKA API / Creating datasets in memory) should help you solve the task.
Once a 10-fold cross-validation is done with a classifier, how can I print out the prediced class of every instance and the distribution of these instances?
J48 j48 = new J48();
Evaluation eval = new Evaluation(newData);
eval.crossValidateModel(j48, newData, 10, new Random(1));
When I tried something similar to below, it said that the classifier is not built.
for (int i=0; i<data.numInstances(); i++){
System.out.println(j48.distributionForInstance(newData.instance(i)));
}
What I'm trying to do is the same function as in the WEKA GUI wherein once a classifier is trained, I can click on Visualize classifier error" > Save, and I will find the predicted class in the file. But now I need it in to work in my own Java code.
I have tried something like below:
J48 j48 = new J48();
Evaluation eval = new Evaluation(newData);
StringBuffer forPredictionsPrinting = new StringBuffer();
weka.core.Range attsToOutput = null;
Boolean outputDistribution = new Boolean(true);
eval.crossValidateModel(j48, newData, 10, new Random(1), forPredictionsPrinting, attsToOutput, outputDistribution);
Yet it prompts me the error:
Exception in thread "main" java.lang.ClassCastException: java.lang.StringBuffer cannot be cast to weka.classifiers.evaluation.output.prediction.AbstractOutput
The crossValidateModel() method can take a forPredictionsPrinting varargs parameter that is a weka.classifiers.evaluation.output.prediction.AbstractOutput instance.
The important part of that is a StringBuffer to hold a string representation of all the predictions. The following code is in untested JRuby, but you should be able to convert it for your needs.
j48 = j48.new
eval = Evalution.new(newData)
predictions = java.lange.StringBuffer.new
eval.crossValidateModel(j48, newData, 10, Random.new(1), predictions, Range.new('1'), true)
# variable predictions now hold a string of all the individual predictions
I was stuck some days ago. I wanted to to evaluate a Weka classifier in matlab using a matrix instead of loading from an arff file. I use http://www.mathworks.com/matlabcentral/fileexchange/21204-matlab-weka-interface and the following source code. I hope this help someone else.
import weka.classifiers.*;
import java.util.*
wekaClassifier = javaObject('weka.classifiers.trees.J48');
wekaClassifier.buildClassifier(processed);%Loaded from loadARFF
e = javaObject('weka.classifiers.Evaluation',processed);%Loaded from loadARFF
myrand = Random(1);
plainText = javaObject('weka.classifiers.evaluation.output.prediction.PlainText');
buffer = javaObject('java.lang.StringBuffer');
plainText.setBuffer(buffer)
bool = javaObject('java.lang.Boolean',true);
range = javaObject('weka.core.Range','1');
array = javaArray('java.lang.Object',3);
array(1) = plainText;
array(2) = range;
array(3) = bool;
e.crossValidateModel(wekaClassifier,testing,10,myrand,array)
e.toClassDetailsString
Asdrúbal López-Chau
clc
clear
%Load from disk
fileDataset = 'cm1.arff';
myPath = 'C:\Users\Asdrubal\Google Drive\Respaldo\DoctoradoALCPC\Doctorado ALC PC\AlcMobile\AvTh\MyPapers\Papers2014\UnderOverSampling\data\Skewed\datasetsKeel\';
javaaddpath('C:\Users\Asdrubal\Google Drive\Respaldo\DoctoradoALCPC\Doctorado ALC PC\AlcMobile\JarsForExperiments\weka.jar');
wekaOBJ = loadARFF([myPath fileDataset]);
%Transform from data into Matlab
[data, featureNames, targetNDX, stringVals, relationName] = ...
weka2matlab(wekaOBJ,'[]');
%Create testing and training sets in matlab format (this can be improved)
[tam, dim] = size(data);
idx = randperm(tam);
testIdx = idx(1 : tam*0.3);
trainIdx = idx(tam*0.3 + 1:end);
trainSet = data(trainIdx,:);
testSet = data(testIdx,:);
%Trasnform the training and the testing sets into the Weka format
testingWeka = matlab2weka('testing', featureNames, testSet);
trainingWeka = matlab2weka('training', featureNames, trainSet);
%Now evaluate classifier
import weka.classifiers.*;
import java.util.*
wekaClassifier = javaObject('weka.classifiers.trees.J48');
wekaClassifier.buildClassifier(trainingWeka);
e = javaObject('weka.classifiers.Evaluation',trainingWeka);
myrand = Random(1);
plainText = javaObject('weka.classifiers.evaluation.output.prediction.PlainText');
buffer = javaObject('java.lang.StringBuffer');
plainText.setBuffer(buffer)
bool = javaObject('java.lang.Boolean',true);
range = javaObject('weka.core.Range','1');
array = javaArray('java.lang.Object',3);
array(1) = plainText;
array(2) = range;
array(3) = bool;
e.crossValidateModel(wekaClassifier,testingWeka,10,myrand,array)%U
e.toClassDetailsString