How to create an attribute in Weka - java

I working on a data mining project using WEKA in Java and the instructions says that I have to create an Attribute object for each attribute in the dataset and add them to a FastVector. I try to look at the API but I don't think I'm doing it right can someone show me the right way to do it. I'm using the iris.arff file
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instances;
import weka.core.converters.ArffSaver;
public class StartWeka {
public static void main(String[]args)throws Exception{
Instances dataset = new Instances(new BufferedReader(new FileReader("C:/Users/Student/workspace/Data Mining/src/iris.arff.txt")));
Instances train = new Instances(dataset);
train.setClassIndex(train.numAttributes()-1);
System.out.println(dataset.toSummaryString());
Attribute a1 = new Attribute("sepallength", 0);
Attribute a2 = new Attribute("sepalwidth", 1);
Attribute a3 = new Attribute("petalwidth", 2);
FastVector attrs = new FastVector();
attrs.addElement(a1);
}
}

FastVector is deprecated. You can use an ArrayList instead.
If you use an arff file, however, you don't have to do any of that. You can just do the following:
ArffLoader loader = new ArffLoader();
loader.setFile(new File("iris.arff");
Instances structure = loader.getStructure();
structure.setClassIndex(structure.numAttributes() - 1);
From here, you can create a classifier based on your instances. (structure).

Related

How to use StringToWordVector (weka) in java?

This is my arff file
#relation hamspam
#attribute text string
#attribute class {ham,spam}
#data
'good',ham
'very good',ham
'bad',spam
'very bad',spam
'very bad, very bad',spam
What i want to do is to classify it with weka clasiffier in my java program, but i don't know how to use StringToWordVector and then classify it.
this my code:
Classifier j48tree = new J48();
Instances train = new Instances(new BufferedReader(new FileReader("data.arff")));
StringToWordVector filter = new StringToWordVector();
What next?, i don't know what to do..
import weka.core.Instance;
//import required classes
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.stemmers.LovinsStemmer;
import weka.classifiers.meta.FilteredClassifier;
import weka.classifiers.trees.J48;
import weka.filters.unsupervised.attribute.Remove;
import weka.filters.unsupervised.attribute.StringToWordVector;
public class ClassifierWithFilter{
public static void main(String args[]) throws Exception{
//load dataset
DataSource source = new DataSource("/Users/amaryadav/Desktop/spamham.arff");
Instances dataset = source.getDataSet();
//set class index to the last attribute
dataset.setClassIndex(dataset.numAttributes()-1);
//the base classifier
J48 tree = new J48();
//the filter
StringToWordVector filter = new StringToWordVector();
filter.setInputFormat(dataset);
filter.setIDFTransform(true);
filter.setUseStoplist(true);
LovinsStemmer stemmer = new LovinsStemmer();
filter.setStemmer(stemmer);
filter.setLowerCaseTokens(true);
//Create the FilteredClassifier object
FilteredClassifier fc = new FilteredClassifier();
//specify filter
fc.setFilter(filter);
//specify base classifier
fc.setClassifier(tree);
//Build the meta-classifier
fc.buildClassifier(dataset);
System.out.println(tree.graph());
System.out.println(tree);
}
}
This code uses J48 decision tree to build a classifier trained with spamham.arff. Hope that helps.

use weka with java for prediction on test set

I am trying to get the predictions on test set using evaluateModel function, however evaluation.evaluateModel(classifier, newTest,output) throws an exception.
Exception in thread "main" weka.core.WekaException: No dataset
structure provided!
import weka.classifiers.Evaluation;
import weka.core.Attribute;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.classifiers.Evaluation;
import weka.core.converters.ConverterUtils.DataSource;
import weka.attributeSelection.CfsSubsetEval;
import weka.attributeSelection.ASSearch;
import weka.attributeSelection.BestFirst;
import weka.classifiers.functions.LinearRegression;
import weka.classifiers.meta.AttributeSelectedClassifier;
import weka.filters.supervised.attribute.AttributeSelection;
import weka.classifiers.evaluation.output.prediction.CSV;
public void evaluateTest() throws Exception
{
DataSource train = new DataSource(trainingData.toString());
Instances traininstances = train.getDataSet();
Attribute attr=traininstances.attribute("regressionLabel");
int trainindex=attr.index();
traininstances.setClassIndex(trainindex);
DataSource test = new DataSource(testData.toString());
Instances testinstances = test.getDataSet();
Attribute testattr=testinstances.attribute(regressionLabel);
int testindex=testattr.index();
testinstances.setClassIndex(testindex);
AttributeSelection filter = new AttributeSelection();
weka.classifiers.AbstractClassifier classifier ;
filter.setSearch(this.search);
filter.setEvaluator(this.eval);
filter.setInputFormat(traininstances); // initializing the filter once with training set
Instances newTrain = AttributeSelection.useFilter( traininstances, filter); // configures the Filter based on train instances and returns filtered instances
Instances newTest = AttributeSelection.useFilter(testinstances, filter);
classifier= new LinearRegression();
classifier.buildClassifier(newTrain);
StringBuffer buffer = new StringBuffer();
CSV output = new CSV();
output.setBuffer(buffer);
output.setOutputFile(predictFile);
Evaluation evaluation = new Evaluation(newTrain);
evaluation.evaluateModel(classifier, newTest,output);
}
The same thing works with evaluation.crossValidateModel.

How to create multiple files based on one Freemarker Template

I'm having a little bit trouble with freemarker right now. What I want to do basically in my template: iterate over a list of elements and create for each element a new file.
<#assign x=3>
<#list 1..x as i>
${i}
...create a new file with the output of this loop iteration...
</#list>
I did not find anything about this in the freemarker manual or google. Is there a way to do this?
You can implement this with a custom directive. See freemarker.template.TemplateDirectiveModel, and particularly TemplateDirectiveBody. Custom directives can specify the Writer used in their nested content. So you can do something like <#output file="...">...</#output>, where the nested content will be written into the Writer you have provided in your TemplateDirectiveModel implementation, which in this case should write into the file specified. (FMPP does this too: http://fmpp.sourceforge.net/qtour.html#sect4)
You cannot do this using only FreeMarker. Its idea is to produce the single output stream from your template. It doesn't even care whether you will save the result to file, pass directly to TCP socket, store in the memory as string or do anything else.
If you really want to achieve this, you have to handle file separation by yourself. For example, you can insert special line like:
<#assign x=3>
<#list 1..x as i>
${i}
%%%%File=output${i}.html
...
</#list>
After that you should post-process FreeMarker output by yourself looking for the lines started with %%%%File= and create a new file at this point.
As ddekany said, you can do that implementing a directive. I have coded a little example:
package spikes;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.HashMap;
import java.util.Map;
import freemarker.core.Environment;
import freemarker.template.Configuration;
import freemarker.template.SimpleScalar;
import freemarker.template.Template;
import freemarker.template.TemplateDirectiveBody;
import freemarker.template.TemplateDirectiveModel;
import freemarker.template.TemplateException;
import freemarker.template.TemplateModel;
import io.vertx.core.json.JsonArray;
import io.vertx.core.json.JsonObject;
class OutputDirective implements TemplateDirectiveModel {
#Override
public void execute(
Environment env,
#SuppressWarnings("rawtypes") Map params,
TemplateModel[] loopVars,
TemplateDirectiveBody body)
throws TemplateException, IOException {
SimpleScalar file = (SimpleScalar) params.get("file");
FileWriter fw = new FileWriter(new File(file.getAsString()));
body.render(fw);
fw.flush();
}
}
public class FreemarkerTest {
public static void main(String[] args) throws Exception {
Configuration cfg = new Configuration(Configuration.VERSION_2_3_0);
cfg.setDefaultEncoding("UTF-8");
JsonObject model = new JsonObject()
.put("entities", new JsonArray()
.add(new JsonObject()
.put("name", "Entity1"))
.add(new JsonObject()
.put("name", "Entity2")));
Template template = new Template("Test", "<#assign model = model?eval_json><#list model.entities as entity><#output file=entity.name + \".txt\">This is ${entity.name} entity\n</#output></#list>", cfg);
Map<String, Object> root = new HashMap<String, Object>();
root.put("output", new OutputDirective());
root.put("model", model.encode());
Writer out = new OutputStreamWriter(System.out);
template.process(root, out);
}
}
This will generate two files:
"Entity1.txt": This is Entity1 entity
"Entity2.txt": This is Entity2 entity
:-)

How to specify the base classifier in stacking method when using Weka API?

I was trying to use stacking method weka api in java and found a tutorial for single classifier. I tried implementing stacking using the method described in the tutorial method but the classification is done with default Zero classifier in Weka.I was able to set meta classifier using "setMetaClassifier" but not able to change the base classifier.What is the proper method to set base classifier in stacking ?
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.Random;
import weka.classifiers.Evaluation;
import weka.classifiers.meta.Stacking;
import weka.core.Instances;
public class startweka {
public static void main(String[] args) throws Exception{
BufferedReader breader=new BufferedReader(new FileReader("C:/newtrain.arff"));
Instances train=new Instances(breader);
train.setClassIndex(train.numAttributes()-1);
breader.close();
String[] stackoptions = new String[1];
{
stackoptions[0] = "-w weka.classifiers.functions.SMO";
}
Stacking nb=new Stacking();
J48 j48=new J48();
SMO jj=new SMO();
nb.setMetaClassifier(j48);
nb.buildClassifier(train);
Evaluation eval=new Evaluation(train);
eval.crossValidateModel(nb, train, 10, new Random(1));
System.out.println(eval.toSummaryString("results",true));
}}
Ok i found the answer in other forum weka nabble.The code for setting base classifier is
Stacking nb=new Stacking();
SMO smo=new SMO();
Classifier[] stackoptions = new Classifier[1];
stackoptions[0] = smo;
nb.setClassifiers(stackoptions);
OR
Stacking nb=new Stacking();
SMO smo=new SMO();
Classifier[] stackoptions = new Classifier[] {smo};
nb.setClassifiers(stackoptions);

Error in Bugzilla Code to create Bug

Here is a code to create a new bug in Bugzilla using Java. But I am getting following error.
BugCreator2.java:20: error: cannot find symbol
factory.setHttpClient(httpClient);
^
symbol: method setHttpClient(HttpClient)
location: variable factory of type XmlRpcCommonsTransportFactor
Note: BugCreator2.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 error
Following Jar Files I have used:
commons-httpclient-3.0.1
java-rt-jar-stubs-1.5.0
ws-commons-util-1.0.1
ws-commons-util-1.0.1-sources
xmlrpc-3.0
xmlrpc-3.0-common
I don't knnow if all of them are required.
import java.net.MalformedURLException;
import java.net.URL;
import java.util.HashMap;
import java.util.Map;
import org.apache.commons.httpclient.HttpClient;
import org.apache.xmlrpc.XmlRpcException;
import org.apache.xmlrpc.client.XmlRpcClient;
import org.apache.xmlrpc.client.XmlRpcClientConfigImpl;
import org.apache.xmlrpc.client.XmlRpcCommonsTransportFactory;
public class BugCreator2 {
public static void main(String s[])
throws MalformedURLException, XmlRpcException {
HttpClient httpClient = new HttpClient();
XmlRpcClient rpcClient = new XmlRpcClient();
XmlRpcCommonsTransportFactory factory = new XmlRpcCommonsTransportFactory(rpcClient);
XmlRpcClientConfigImpl config = new XmlRpcClientConfigImpl();
factory.setHttpClient(httpClient);
rpcClient.setTransportFactory(factory);
config.setServerURL(new URL("http://URL/bugzilla/xmlrpc.cgi"));
rpcClient.setConfig(config);
// map of the login data
Map loginMap = new HashMap();
loginMap.put("login", "username#abc.com");
loginMap.put("password", "*********");
loginMap.put("rememberlogin", "Bugzilla_remember");
// login to bugzilla
Object loginResult = rpcClient.execute("User.login", new Object[]{loginMap});
System.err.println ("loginResult=" + loginResult);
// map of the bug data
Map bugMap = new HashMap();
bugMap.put("product", "Demo");
bugMap.put("component", "Demo_project");
bugMap.put("summary", "Bug created for test");
bugMap.put("description", "This is text ");
bugMap.put("version", "unspecified");
bugMap.put("op_sys", "Windows");
bugMap.put("platform", "PC");
bugMap.put("priority", "P2");
bugMap.put("severity", "Normal");
bugMap.put("status", "NEW");
// create bug
Object createResult = rpcClient.execute("Bug.create", new Object[]{bugMap});
System.err.println("createResult = " + createResult);
}
}
After much efforts i came to know there is problem with versions of JARs. You need to use exact JARS as come other versions are not supporting some methods.
Jars Used:
commons-httpclient-3.0.1
commons-logging-1.1.3
java-rt-jar-stubs-1.5.0
org.apache.commons.codec_1.3.0.v201101211617
ws-commons-util-1.0.2
xmlrpc-client-3.1.3
xmlrpc-common-3.1.3

Categories

Resources