using the saved model for predicting using weka (Eclipse+Java) - java

I was confused with the arguments of the lines "Instances originalTrain=" can anyone please help me to correct this error since I was new to this weka. We are creating a disease prediction system using weka in java.
import weka.classifiers.Classifier;
import weka.core.Instances;
public class Main {
public static void main(String[] args) throws Exception
{
String rootPath="/some/where/";
Instances originalTrain= //instances here (don't know to complete this statement)
//load model
Classifier cls = (Classifier) weka.core.SerializationHelper.read(rootPath+"tree.model");
//predict instance class values
Instances originalTrain= //load or create Instances to predict (This statement too)
//which instance to predict class value
int s1=0;
//perform your prediction
double value=cls.classifyInstance(originalTrain.instance(s1));
//get the prediction percentage or distribution
double[] percentage=cls.distributionForInstance(originalTrain.instance(s1));
//get the name of the class value
String prediction=originalTrain.classAttribute().value((int)value);
System.out.println("The predicted value of instance "+
Integer.toString(s1)+
": "+prediction);
//Format the distribution
String distribution="";
for(int i=0; i <percentage.length; i=i+1)
{
if(i==value)
{
distribution=distribution+"*"+Double.toString(percentage[i])+",";
}
else
{
distribution=distribution+Double.toString(percentage[i])+",";
}
}
distribution=distribution.substring(0, distribution.length()-1);
System.out.println("Distribution:"+ distribution);
}
}

For completeness, the code snippet in the question originates from Get prediction percentage in WEKA using own Java code and a model.
originalTrain should be your training instances. There are two ways that I know to add instances to originalTrain.
This method loads data from an .arff file and is based on instructions found here.
// rootPath should be where the .arff file is held
// filename should hold the complete name of the .arff file
public static Instances instanceData(String rootPath, String filename) throws Exception
{
// initialize source
DataSource source = null;
Instances data = null;
source = new DataSource(rootPath + filename);
data = source.getDataSet();
// set the class to the last attribute of the data (may need to tweak)
if (data.classIndex() == -1)
data.setClassIndex(data.numAttributes() -1 );
return data;
}
You can create and add instance manually as described in this answer Define input data for clustering using WEKA API .

Related

How do I read files as a object or is there a better solution to this? Please see code bellow for context

So in my java class, we need to read this file and somehow converts its content into an object
import java.util.Scanner;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
public class Calendar {
public Appointment[] appointments;
Calendar()
{
appointments = null;
}
Calendar(int capacity, String filename)
{
Appointment[] appointments = new Appointment[capacity];
//you can see that appointments is an Appointment object
readCalendarFromFile(filename);}
private void readCalendarFromFile(String fileName){
Scanner fileRead = null;
try
{
fileRead = new Scanner(new FileInputStream("appointments.txt"));
for(int r = 0; r < 30; r++)
appointments[r]= fileRead.nextLine(); ----> This is where I am getting my error from as I cannot convert String into an object. Is there a way that I can pass this
fileRead.close();
}
catch (FileNotFoundException fe)
{
fe.printStackTrace();
System.err.println("Unable to open the file " + fileName + " for reading.");
}
}
}
Is there any way that I can convert filetext into an object or do I have to do something else with it? I have to make an appointment an object so I can't change it into anything else sadly.
You have to have a class Appointment somewhere, and what you are trying to do is add an object of the type Appointment to the array appointments, based on the info you get from the text file, right?
So, you have your for loop that reads every line from the text file, and then you need to create instances of Appointment for each line.
The class Appointment has some kind of constructor, that you need to call to create a new object (read: "a new instance") from it.
Let's assume it looks like this:
public Appointment(String title, String time, String location) {
this.title = title;
this.time = time;
this.location = location;
}
Let's also assume that every line in the file appointments.txt is formatted in the following way:
<Title>, <Time>, <Location>
Which means, that you would have to parse the line that you read from the file by splitting it (the delimiter in this case would be the ",". Just do a quick research on the internet on how to split Strings in Java, it's pretty easy actually.
When you have all the bits of information in separate variables, you have to call the constructor of Appointment, to create a new appointment that you can then add to your array. Assuming that you have three Strings with the title, the time and the location of the appointment (or whatever info you have in the text file), this would look like this:
try{
fileRead = new Scanner(new FileInputStream("appointments.txt"));
int counter = 0;
while(fileRead.hasNext()) {
String lineRead = fileRead.nextLine();
// here comes the parsing of the line into three String variables
appointments[counter] = new Appointment(title, time, location);
fileRead.close();
}
} catch(FileNotFoundException ex) {
// Do some exception handling in here, or just print the stacktrace
}
The line I want you to pay the most attention to is the Line, where it says new Appointment(title, time, location). The difference between this and the code that you posted is, that here I create a new object of the type Appointment, that corresponds with the type of the array you created earlier, in the line Appointment[] appointments = new Appointment[capacity].
You tried to directly add a String to the array, although you declared an array of the type Appointment, not of the type String.
You should read up on the topic of objects in Java in general, and what constructors are, what they do and how you use them.
For example, this topic gets explained really well and exhaustive in the official Java tutorials from Oracle (the company that develops the Java Language). I linked you the specific section that talks about constructors, but I would suggest that you read at least the whole chapter and everything before it that helps you understand what they actually talk about.
Hope this helps :)

Unable to read one file and generate one output CSV at a time in Java

I am reading files from one folder, each folder can have various lines of records. So I am creating two arrayList here:
List<Address> addressList;
List<List<Address>> addressLists;
I am calling readIncomingFiles method which returns object of List<List<Address>>
// The code structure of method which reads incomingFiles
public List<List<Address>> readIncomingFiles() {
//some lines of code Processing data
if (!(addressList == null || addressList.size() == 0)) {
addressLists.add(addressList);
}
return addressLists;
}
Now in addressLists I have records from all files with all records. In main method I have process method where first it reads my objects addressLists which as all records. Suppose three files are there with three record each, it will have total 9 records.
void process() //main method {
this.addressLists=this.readIncomingFiles();
List<String> outgoingFileNames = this.getOutgoingFileName();
//Here I am creating a list for all outgoing files which will be generated and kept in destination folder.from getOutgoingFileName method
for (String outgoingFile : outgoingFileNames) {
if(validate file if file contains csv in output generated file name)
then call ProcessFile
ProcessFile()
for (List<AddressDto> listOfAddress : this.addressLists) {
for (AddressDto address : listOfAddress) {
this.csvOut = new OutputCsvDataDto();
//Process files and Records.
// Here OutputCsvDataDto returns data drom result generating method which writes records in OutputCsvDataDto List.
}
The problem is it reads all files and all records as it returns List<List<Address>, also the method getoutgoing file generates 3 output file one a time and returns a list. The code structure for outgoingfile method is pasted below:
public List<String> getOutgoingFileName {
for (File incomingFile : incomingFileFolder.listFiles()) {
outgoingFilenames.add("results_" + incomingFile.getName());
}
}
How can I read one record at a time? If I read one record at a time, how will I process other records? I am new in Java.
I'm not sure if I got you correctly, but let's sum it up:
there's one folder with many files,
these files contain variable count of lines,
each line is a record of some kind (presumably post address),
you want read all these files, process it one-by-one, by line and save output to a file, which name bases somehow on the input file
If all of the above is correct, then first let's start with the modification of the whole process, as currently you read all the files into memory and this is something you don't really want to do. What if each file takes, like 4GiB? So better approach is to process this whole thing in a manner more reassembling a "pipeline" than a "storage".
So, it goes more or less like this:
public static void main(String[] args) {
File incomingDir = new File(args[0]);
File outgoingDir = new File(args[1]);
for (File f : incomingDir.listFiles()) {
processFile(f, outgoingDir);
}
}
private static void processFile(File incomingFile, File outgoingDir) {
File outgoingFile = new File(outputDir, "results-" + incomingFile.getName());
for (String line : /* read lines from incomingFile */) {
Address address = parseAddress(line);
/* write address to outgoingFile */
}
}
private static Address parseAddress(String line) {
Address address;
/* do parsing */
return address;
}
Of course you need to adapt the code, probably use BufferedReader in while loop (instead of for loop in processFile as above), but it's more to sketch out the concept than give a "copy-paste" answer. And think how you can make this code work in parallel and if always does it make sense.

Java updating file reference after renaming

Hi there I have a problem dealing with some legacy code.
I need a way to get the changed File from the parseFile() method up to the calling doWithFileList() method.
public static void main(String[] args) throws IOException {
File file1 = File.createTempFile("file1", ".tmp");
File file2 = File.createTempFile("file2", ".tmp");
ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file1);
fileList.add(file2);
doWithFileList(fileList);
}
static void doWithFileList(List<File> fileList) {
for (File file : fileList) {
String result = parseFile(file);
}
//Do something with the (now incorrect) file objects
for (File file : fileList) {
// always false here
if (!file.exists()) {
System.out.println("File does not exist anymore");
}
}
}
private static String parseFile(File file) {
//1. Get information from the File
//2. Use this information to load an object from the Database
//3. return some property of this object
//4. depending on another property of the DB object rename the file
file.renameTo(new File(file.getAbsoluteFile() + ".renamed"));
return "valueParsedFromFile";
}
I know that File objects are immutable.
The problem is in my real world problem the parseFile() method at the moment only does Step step 1-3 but I need to add step 4.
The renaming is not a problem, but I need to get the new file name somehow to the calling method.
in the real life problem there is bigger stack trace across multiple objects between those methods.
What would be the best way to get the changed name of the file back to the beginning of the the call hierarchy where I can change the object in the list.
my best guess at the moment would be to create a ReturnObject that holds both the String to return and the new File object. But then I have to refactor a bunch of methods on my way up so I would need to create a bunch of different return objects.
The following possiblities come to mind:
pass a mutable object, e.g. a new String[1] and set it there. (Mega-ugly, because you have side effects and not a pure function anymore) (On the other hand: you already have side-effects - go figure ;-))
Use a generic return object like String[], a Map, a Pair-implementation that you can find in various utilities (e.g. org.colllib.datastruct.Pair)
Use a hand-crafted return object
Personally, I'd probably go with (2), but it also might be (3)
Using a ReturnObjet seem to be the sole solution as far as I know.

text classifier with weka: how to correctly train a classifier issue

I'm trying to build a text classifier using Weka, but the probabilities with distributionForInstance of the classes are 1.0 in one and 0.0 in all other cases, so classifyInstance always returns the same class as prediction. Something in the training doesn't work correctly.
ARFF training
#relation test1
#attribute tweetmsg String
#attribute classValues {politica,sport,musicatvcinema,infogeneriche,fattidelgiorno,statopersonale,checkin,conversazione}
#DATA
"Renzi Berlusconi Salvini Bersani",politica
"Allegri insulta la terna arbitrale",sport
"Bravo Garcia",sport
Training methods
public void trainClassifier(final String INPUT_FILENAME) throws Exception
{
getTrainingDataset(INPUT_FILENAME);
//trainingInstances consists of feature vector of every input
for(Instance currentInstance : inputDataset)
{
Instance currentFeatureVector = extractFeature(currentInstance);
currentFeatureVector.setDataset(trainingInstances);
trainingInstances.add(currentFeatureVector);
}
classifier = new NaiveBayes();
try {
//classifier training code
classifier.buildClassifier(trainingInstances);
//storing the trained classifier to a file for future use
weka.core.SerializationHelper.write("NaiveBayes.model",classifier);
} catch (Exception ex) {
System.out.println("Exception in training the classifier."+ex);
}
}
private Instance extractFeature(Instance inputInstance) throws Exception
{
String tweet = inputInstance.stringValue(0);
StringTokenizer defaultTokenizer = new StringTokenizer(tweet);
List<String> tokens=new ArrayList<String>();
while (defaultTokenizer.hasMoreTokens())
{
String t= defaultTokenizer.nextToken();
tokens.add(t);
}
Iterator<String> a = tokens.iterator();
while(a.hasNext())
{
String token=(String) a.next();
String word = token.replaceAll("#","");
if(featureWords.contains(word))
{
double cont=featureMap.get(featureWords.indexOf(word))+1;
featureMap.put(featureWords.indexOf(word),cont);
}
else{
featureWords.add(word);
featureMap.put(featureWords.indexOf(word), 1.0);
}
}
attributeList.clear();
for(String featureWord : featureWords)
{
attributeList.add(new Attribute(featureWord));
}
attributeList.add(new Attribute("Class", classValues));
int indices[] = new int[featureMap.size()+1];
double values[] = new double[featureMap.size()+1];
int i=0;
for(Map.Entry<Integer,Double> entry : featureMap.entrySet())
{
indices[i] = entry.getKey();
values[i] = entry.getValue();
i++;
}
indices[i] = featureWords.size();
values[i] = (double)classValues.indexOf(inputInstance.stringValue(1));
trainingInstances = createInstances("TRAINING_INSTANCES");
return new SparseInstance(1.0,values,indices,1000000);
}
private void getTrainingDataset(final String INPUT_FILENAME)
{
try{
ArffLoader trainingLoader = new ArffLoader();
trainingLoader.setSource(new File(INPUT_FILENAME));
inputDataset = trainingLoader.getDataSet();
}catch(IOException ex)
{
System.out.println("Exception in getTrainingDataset Method");
}
System.out.println("dataset "+inputDataset.numAttributes());
}
private Instances createInstances(final String INSTANCES_NAME)
{
//create an Instances object with initial capacity as zero
Instances instances = new Instances(INSTANCES_NAME,attributeList,0);
//sets the class index as the last attribute
instances.setClassIndex(instances.numAttributes()-1);
return instances;
}
public static void main(String[] args) throws Exception
{
Classificatore wekaTutorial = new Classificatore();
wekaTutorial.trainClassifier("training_set_prova_tent.arff");
wekaTutorial.testClassifier("testing.arff");
}
public Classificatore()
{
attributeList = new ArrayList<Attribute>();
initialize();
}
private void initialize()
{
featureWords= new ArrayList<String>();
featureMap = new TreeMap<>();
classValues= new ArrayList<String>();
classValues.add("politica");
classValues.add("sport");
classValues.add("musicatvcinema");
classValues.add("infogeneriche");
classValues.add("fattidelgiorno");
classValues.add("statopersonale");
classValues.add("checkin");
classValues.add("conversazione");
}
TESTING METHODS
public void testClassifier(final String INPUT_FILENAME) throws Exception
{
getTrainingDataset(INPUT_FILENAME);
//trainingInstances consists of feature vector of every input
Instances testingInstances = createInstances("TESTING_INSTANCES");
for(Instance currentInstance : inputDataset)
{
//extractFeature method returns the feature vector for the current input
Instance currentFeatureVector = extractFeature(currentInstance);
//Make the currentFeatureVector to be added to the trainingInstances
currentFeatureVector.setDataset(testingInstances);
testingInstances.add(currentFeatureVector);
}
try {
//Classifier deserialization
classifier = (Classifier) weka.core.SerializationHelper.read("NaiveBayes.model");
//classifier testing code
for(Instance testInstance : testingInstances)
{
double score = classifier.classifyInstance(testInstance);
double[] vv= classifier.distributionForInstance(testInstance);
for(int k=0;k<vv.length;k++){
System.out.println("distribution "+vv[k]); //this are the probabilities of the classes and as result i get 1.0 in one and 0.0 in all the others
}
System.out.println(testingInstances.attribute("Class").value((int)score));
}
} catch (Exception ex) {
System.out.println("Exception in testing the classifier."+ex);
}
}
I want to create a text classifier for short messages, this code is based on this tutorial http://preciselyconcise.com/apis_and_installations/training_a_weka_classifier_in_java.php . The problem is that the classifier predict the wrong class for almost every message in the testing.arff because the probabilities of the classes are not correct. The training_set_prova_tent.arff has the same number of messages per class.
The example i'm following use a featureWords.dat and associate 1.0 to the word if it is present in a message instead I want to create my own dictionary with the words present in the training_set_prova_tent plus the words present in testing and associate to every word the number of occurrences .
P.S
I know that this is exactly what can i do with the filter StringToWordVector but I haven't found any example that exaplain how to use this filter with two file: one for the training set and one for the test set. So it seems easier to adapt the code I found.
Thank you very much
It seems like you changed the code from the website you referenced in some crucial points, but not in a good way. I'll try to draft what you're trying to do and what mistakes I've found.
What you (probably) wanted to do in extractFeature is
Split each tweet into words (tokenize)
Count the number of occurrences of these words
Create a feature vector representing these word counts plus the class
What you've overlooked in that method is
You never reset your featureMap. The line
Map<Integer,Double> featureMap = new TreeMap<>();
originally was at the beginning extractFeatures, but you moved it to initialize. That means that you always add up the word counts, but never reset them. For each new tweet, your word count also includes the word count of all previous tweets. I'm sure that is not what you wanted.
You don't initialize featureWords with the words you want as features. Yes, you create an empty list, but you fill it iteratively with each tweet. The original code initialized it once in the initialize method and it never changed after that. There are two problems with that:
With each new tweet, new features (words) get added, so your feature vector grows with each tweet. That wouldn't be such a big problem (SparseInstance), but that means that
Your class attribute is always in another place. These two lines work for the original code, because featureWords.size() is basically a constant, but in your code the class label will be at index 5, then 8, then 12, and so on, but it must be the same for every instance.
indices[i] = featureWords.size();
values[i] = (double) classValues.indexOf(inputInstance.stringValue(1));
This also manifests itself in the fact that you build a new attributeList with each new tweet, instead of only once in initialize, which is bad for already explained reasons.
There may be more stuff, but - as it is - your code is rather unfixable. What you want is much closer to the tutorial source code which you modified than your version.
Also, you should look into StringToWordVector because it seems like this is exactly what you want to do:
Converts String attributes into a set of attributes representing word occurrence (depending on the tokenizer) information from the text contained in the strings. The set of words (attributes) is determined by the first batch filtered (typically training data).

storing names and other information

I am creating a prison system where I need to store the names and because I need to print out all the prisoner information in one of the methods. I want to make it remember and store information such as name, id and crimes etc. How can I go about doing this?
About the posted answers, I don't think it needs to be something that complicated because I haven't learnt any of this for the assignment. All I want is for my program to print out the prisoner ID, name, starting and ending date, crime with just one run of the program after I am prompted to enter the information.
INPUT/OUTPUT
New Prisoner
Enter Name:
Enter crime:
Enter Name:
Enter crime:
Prisoner information
(name) has committed (crime)
(name) has committed (crime)
The short answer is "a database."
Your question indicates that the following could be overwhelming but it could be worth some effort to read about "Macto," an end-to-end sample Ayende Rahein has been writing about.
You haven't made particularly clear in your question as to whether you just want to store the prisoner details in memory while the program is running, or if you want to persist the prisoners to disk, so that you can close your program and load them again next time you start it.
If its the former, you can just store the prisoners in an array or a list. For example assuming your prisoner class looks something like this:
public class Prisoner {
private String name;
private String crime;
public Prisoner(String name, String crime) {
this.name = name;
this.crime = crime;
}
public String getName() {
return name;
}
public String getCrime() {
return crime;
}
}
You can then store the prisoners in a list...
List<Prisoner> prisoners = new LinkedList<Prisoner>();
prisoners.add(new Prisoner("Bob", "Murder"));
prisoners.add(new Prisoner("John", "Fraud"));
...and then iterate over the list and print the details out...
for(Prisoner p : prisoners) {
System.out.println(p.getName() + " committed " + p.getCrime());
}
If you're looking for a way to persist the prisoner details between runs of the program there are a number of possible approaches, most of which have already mentioned. In most cases a database is the best solution for storing records with JDBC being the simplest way of connecting to and interacting with a database.
For simplicity however, I would suggest storing the details in a CSV (comma separated value) file. A CSV file is simply a plain text file that stores each record on a new line, with a comma separating each field. For example:
Bob, Murder
John, Fraud
There are a number of CSV reading libraries around (see here), however its quite easy to read + write to a CSV file with no external libraries. Below is an example:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
public class PrisonerStore {
/**
* The file the prisoners are stored in
*/
private File store;
public PrisonerStore(File store) {
this.store = store;
}
/**
* Saves the specified prisoner to the file
* #param prisoner
*/
public void savePrisoner(Prisoner prisoner) throws IOException {
BufferedWriter writer = new BufferedWriter(new FileWriter(store, true));
writer.write(prisoner.getName() + "," + prisoner.getCrime());
writer.newLine();
writer.close();
}
/**
* Reads all prisoners from the file and returns a list of prisoners
* #return
*/
public List<Prisoner> loadPrisoners() throws IOException{
List<Prisoner> prisoners = new LinkedList<Prisoner>();
BufferedReader br = new BufferedReader(new FileReader(store));
//Read each line of the file and create a Prisoner object from it
String line = null;
while((line = br.readLine()) != null) {
String[] parts = line.split(",");
Prisoner p = new Prisoner(parts[0], parts[1]);
prisoners.add(p);
}
br.close();
return prisoners;
}
}
In your code you can then do something like the following:
PrisonerStore store = new PrisonerStore(new File("C:\\myFile.csv"));
Prisoner p1 = new Prisoner("Bob", "Murder");
Prisoner p2 = new Prisoner("John", "Fraud");
try {
store.savePrisoner(p1);
store.savePrisoner(p2);
List<Prisoner> list = store.loadPrisoners();
for(Prisoner p : list) {
System.out.println(p.getName() + " committed " + p.getCrime());
}
} catch (IOException e) {
System.out.println("Error storing prisoners");
}
If these informations need to persist beyond the life of the VM, you'll have to write them on a physical storage (actually, persistence is the mechanism that allow to pass from a physical storage to an in memory representation).
There are several solutions for this purpose:
Java Object Serialization
A prevalent system with a library like Prevalayer
XML serialization with a library like XStream
A database (relational or not)
Serialization is Java built-in persistence mechanism but is very fragile. Prevalence is based on serialization but I have no experience with it and I'm not sure it solves the weakness of serialization. XML serialization is interesting and quite fast to put in place, especially with a library like XSteam. Finally, a database is the most "standard" solution but introduces some complexity. Depending on your needs, use straight JDBC or JPA for the data access.
My advice: If you don't need a database, go for XML serialization and use XStream. See the Two Minute Tutorial on XStream web site to get started. If you don't need persistence at all (beyond the life of the VM), just store the prisoners in a List.
Where do you want store information ?
If you want store information in program (memory), you can use a static member variables,like this:
// Prisoner.java
class Prisoner {
public String Name;
public int Age;
}
// Prisoners.java
class Prisoners {
public static Prisoner[] GetAll() {
Prisoner[] _data;
// Load from database to _data;
return _data;
}
}
// test.java
class test() {
public static void out() {
System.out.println(main.allPrisoner.getLength());
}
}
// main.java
public class main{
public static Prisoner[] allPrisoner;
public static main(String args[]){
public allPrisoner = Prisoners.GetAll();
// From now all prisoners will be stored in program memory until you close it
}
}
So, If you are Web Development, you can use WebCache
If you are looking to use a database, one place to start is with Hibernate. Its a java library that can provide java object to relational database table mapping.
If you want to persist to a file system using an object serialization routine, I'd recommend XStream to serialize XML or JSON text.
Based on the added text to the question, I'd recommend having a look at XStream just because it is so simple to use if you need to get the data structures to a file on the disk. However, more basically...
You probably just need to make a Prisoner class that has the stuff you need Prisoner to have, such as a name, identifier, etc, etc.
public class Prisoner
{
private String name;
private int identifier;
public Prisoner(String aName, int anId)
{
name = aName;
identifier = anId;
}
public String toString()
{
return "Prisoner[ name = " + name + ", id = " + identifier + " ]";
}
}
Then you can store them in a Map<String, Prisoner> to make finding them easier.
Map<String, Prisoner> prisonMap = new HashMap<String, Prisoner>();
To enter them in from the command line, you'll probably need to use System.in
Sun provides a good tutorial for it.
If you just want to print them back out on the command line, you'll iterate over the Map's keyset and get each value or just iterate over the values and just use System.out.println() to print them out.
for(Prisoner p : prisonMap.values())
{
System.out.println(p);
}
Or, use XStream to print out the XML to file or the System.out stream.

Categories

Resources