I'm traing to classify text using the Weka naive Bayesian.
I trained the classifier over this two phrases:
en "Hello"
it "La casa è"
The idea is to create a classifier for each n-grams size (1<= n <= 5) and then compute the result as a weighed sum of the probabilities of each classifier for each language.
The code for training the classifier with a specific n-grams size is the following:
public void evaluate(int sizeGrams) throws Exception {
trainData.setClassIndex(0);
filter = new StringToWordVector();
filter.setAttributeIndices("last");
MyNGramTokenizer ngram_tok = new MyNGramTokenizer();
ngram_tok.setNGramMinSize(sizeGrams);
ngram_tok.setNGramMaxSize(sizeGrams);
filter.setTokenizer(ngram_tok);
classifier = new FilteredClassifier();
classifier.setFilter(filter);
classifier.setClassifier(new NaiveBayes());
Evaluation eval = new Evaluation(trainData);
eval.crossValidateModel(classifier, trainData, 2, new Random(1));
}
If I try to classify the text "casa" the results are:
Classifying
casa
Classify ngrams:
Size 1
{it=0.9999999999966434, en=3.356604905116531E-12}
Size 2
{it=0.9999999975201513, en=2.479848603138736E-9}
Size 3
{it=0.49999999999999617, en=0.5000000000000039}
Size 4
{it=1.8321005992748378E-6, en=0.9999981678994008}
Size 5
{it=2.479848603138678E-9, en=0.9999999975201515}
Who knows why the probabilities of the italian class shrink with respect to the n-grams size, while that of english class grow up?
I aspected the quite the opposite.
Thanks
Related
For the past week or so, I have been trying to get a neural network to function using RGB images, but no matter what I do it seems to only be predicting one class.
I have read all the links I could find with people encountering this problem and experimented with a lot of different things, but it always ends up predicting only one out of the two output classes. I have checked the batches going in to the model, I have increased the size of the dataset, I have increased the original pixel size(28*28) to 56*56, increased epochs, done a lot of model tuning and I have even tried a simple non-convolutional neural network as well as dumbing down my own CNN model, yet it changes nothing.
I have also checked into the structure of how the data is passed in for the training set(specifically imageRecordReader), but this input structure(in terms of folder structure and how the data is passed into the training set) works perfectly when given gray-scale images(as it originally was created with a 99% accuracy on the MNIST dataset).
Some context: I use the following folder names as my labels, i.e folder(0), folder(1) for both training and testing data as there will only be two output classes. The training set contains 320 images of class 0 and 240 images of class 1, whereas the testing set is made up of 79 and 80 images respectively.
Code below:
private static final Logger log = LoggerFactory.getLogger(MnistClassifier.class);
private static final String basePath = System.getProperty("java.io.tmpdir") + "/ISIC-Images";
public static void main(String[] args) throws Exception {
int height = 56;
int width = 56;
int channels = 3; // RGB Images
int outputNum = 2; // 2 digit classification
int batchSize = 1;
int nEpochs = 1;
int iterations = 1;
int seed = 1234;
Random randNumGen = new Random(seed);
// vectorization of training data
File trainData = new File(basePath + "/Training");
FileSplit trainSplit = new FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);
ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator(); // parent path as the image label
ImageRecordReader trainRR = new ImageRecordReader(height, width, channels, labelMaker);
trainRR.initialize(trainSplit);
DataSetIterator trainIter = new RecordReaderDataSetIterator(trainRR, batchSize, 1, outputNum);
// vectorization of testing data
File testData = new File(basePath + "/Testing");
FileSplit testSplit = new FileSplit(testData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);
ImageRecordReader testRR = new ImageRecordReader(height, width, channels, labelMaker);
testRR.initialize(testSplit);
DataSetIterator testIter = new RecordReaderDataSetIterator(testRR, batchSize, 1, outputNum);
log.info("Network configuration and training...");
Map<Integer, Double> lrSchedule = new HashMap<>();
lrSchedule.put(0, 0.06); // iteration #, learning rate
lrSchedule.put(200, 0.05);
lrSchedule.put(600, 0.028);
lrSchedule.put(800, 0.0060);
lrSchedule.put(1000, 0.001);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.l2(0.0008)
.updater(new Nesterovs(new MapSchedule(ScheduleType.ITERATION, lrSchedule)))
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.weightInit(WeightInit.XAVIER)
.list()
.layer(0, new ConvolutionLayer.Builder(5, 5)
.nIn(channels)
.stride(1, 1)
.nOut(20)
.activation(Activation.IDENTITY)
.build())
.layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
.kernelSize(2, 2)
.stride(2, 2)
.build())
.layer(2, new ConvolutionLayer.Builder(5, 5)
.stride(1, 1)
.nOut(50)
.activation(Activation.IDENTITY)
.build())
.layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
.kernelSize(2, 2)
.stride(2, 2)
.build())
.layer(4, new DenseLayer.Builder().activation(Activation.RELU)
.nOut(500).build())
.layer(5, new OutputLayer.Builder(LossFunctions.LossFunction.SQUARED_LOSS)
.nOut(outputNum)
.activation(Activation.SOFTMAX)
.build())
.setInputType(InputType.convolutionalFlat(56, 56, 3)) // InputType.convolutional for normal image
.backprop(true).pretrain(false).build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
net.setListeners(new ScoreIterationListener(10));
log.debug("Total num of params: {}", net.numParams());
// evaluation while training (the score should go down)
for (int i = 0; i < nEpochs; i++) {
net.fit(trainIter);
log.info("Completed epoch {}", i);
Evaluation eval = net.evaluate(testIter);
log.info(eval.stats());
trainIter.reset();
testIter.reset();
}
ModelSerializer.writeModel(net, new File(basePath + "/Isic.model.zip"), true);
}
Output from running the model:
Odd iteration scores
Evaluation metrics
Any insight would be much appreciated.
I would suggest changing the activation functions in Layer 1 and 2 to a non-linear function. You may try with Relu and Tanh functions.
You may refer to this Documentaion for a list of available activation functions.
Identity on CNNs almost never makes sense 99% of the time. Stick to RELU if you can.
I would instead shift your efforts towards gradient normalization or interspersing drop out layers. Almost every time a CNN doesn't learn, it's usually due to lack of reguarlization.
Also: Never use squared loss with softmax. It never works. Stick to negative log likelihood.
I've never seen squared loss used with softmax in practice.
You can try l2 and l1 regularization (or both: This is called elastic net regularization)
It seems using an ADAM optimizer gave some promising results as well as increasing the batch size(I now have thousands of images) otherwise the net requires an absurd amount of epochs(at least 50+) in order to begin learning.
Thank you for all responses regardless.
I'm using an own bag of word model instead of wekas StringToWordVector (turns out to be a mistake, but as it's only a school project, I'd like to finish it with my approach), so I cannot use it's CrossFoldEvaluation, as my BoW dictionary would contain the words of the training data too.
for (int n = 0; n < folds; n++) {
List<String> allData = getAllReviews(); // 2000 reviews
List<String> trainingData = getTrainingReviews(n, folds); // random 1800 reviews
List<String> testData = getTestReviews(n, folds); // random 200 reviews
bagOfWordsModel.train(trainingData); // builds a vocabulary of 1800 training reviews
Instances inst = bagOfWordsModel.vectorize(allData); // returns 1800 instances with the class attribute set to positive or negative, and 200 without
// todo: evaluate
Classifier cModel = (Classifier) new NaiveBayes();
cModel.buildClassifier(inst);
Evaluation eTest = new Evaluation(inst);
eTest.evaluateModel(cModel, inst);
// print results
String strSummary = eTest.toSummaryString();
System.out.println(strSummary);
}
How can I now evaluate this? I thought, weka will automatically try to determine the class attribute of the instances that have no value for the class attribute. But instead, it tells me weka.filters.supervised.attribute.Discretize: Cannot handle missing class values!
As you have both a training set and a testing set, you should train the classifier on the training data, which should be labelled, and then use the trained model to classify the unlabeled test data.
Classifier cModel = new NaiveBayes();
cModel.buildClassifier(trainingData);
And then, with the use of the following line you should be able to classify an unknown instance and get a prediction:
double clsLabel = cModel.classifyInstance(testData.instance(0));
Or you could use the Evaluation class to make predictions on the entire test set.
Evaluation evaluation = new Evaluation();
evaluation.evaluateModel(cModel, testData);
You have pointed out that you are attempting to implement your own cross-validation by taking a random subset of the data - There is a method that does k-fold cross-validation for you int he Evaluation class (crossValidateModel).
Evaluation evaluation = new Evaluation(trainingData);
evaluation.crossValidateModel(cModel, trainingData, 10, new Random(1));
Note: Cross-validation is used when you don't have a test set by taking a subset of the training data and holding it out of training and using that to evaluate performance cross-validation.
K-fold cross-validation splits the training data into K subsets. It puts one of the subsets aside and uses the remaining to train the classifier, returning to the subset set aside to evaluate the model. It then repeats this process until it has used each subset as the test set.
When Training, only Input the instances with set class.
In this line:
cModel.buildClassifier(inst);
you are Training a naive Bayes classifier. Input only the training examples(!). Evaluate against all data (with labels!). Evaluation checks the predicted Label against the actual Label, if I remember correctly.
The 200 data points without class Label seem useless, what are they for?
I'd like to use the Stanford Classifier for text classification. My features are mostly textual, but there are some numeric features as well (e.g. the length of a sentence).
I started off with the ClassifierExample and replaced the current features by a simple real valued feature F with value 100 if a stop light is BROKEN and 0.1 otherwise, which results in the following code (apart from the makeStopLights() function in line 10-16, this is just the code of the original ClassifierExample class):
public class ClassifierExample {
protected static final String GREEN = "green";
protected static final String RED = "red";
protected static final String WORKING = "working";
protected static final String BROKEN = "broken";
private ClassifierExample() {} // not instantiable
// the definition of this function was changed!!
protected static Datum<String,String> makeStopLights(String ns, String ew) {
String label = (ns.equals(ew) ? BROKEN : WORKING);
Counter<String> counter = new ClassicCounter<>();
counter.setCount("F", (label.equals(BROKEN)) ? 100 : 0.1);
return new RVFDatum<>(counter, label);
}
public static void main(String[] args) {
// Create a training set
List<Datum<String,String>> trainingData = new ArrayList<>();
trainingData.add(makeStopLights(GREEN, RED));
trainingData.add(makeStopLights(GREEN, RED));
trainingData.add(makeStopLights(GREEN, RED));
trainingData.add(makeStopLights(RED, GREEN));
trainingData.add(makeStopLights(RED, GREEN));
trainingData.add(makeStopLights(RED, GREEN));
trainingData.add(makeStopLights(RED, RED));
// Create a test set
Datum<String,String> workingLights = makeStopLights(GREEN, RED);
Datum<String,String> brokenLights = makeStopLights(RED, RED);
// Build a classifier factory
LinearClassifierFactory<String,String> factory = new LinearClassifierFactory<>();
factory.useConjugateGradientAscent();
// Turn on per-iteration convergence updates
factory.setVerbose(true);
//Small amount of smoothing
factory.setSigma(10.0);
// Build a classifier
LinearClassifier<String,String> classifier = factory.trainClassifier(trainingData);
// Check out the learned weights
classifier.dump();
// Test the classifier
System.out.println("Working instance got: " + classifier.classOf(workingLights));
classifier.justificationOf(workingLights);
System.out.println("Broken instance got: " + classifier.classOf(brokenLights));
classifier.justificationOf(brokenLights);
}
}
In my understanding of linear classifiers, feature F should make the classification task pretty easy - after all, we just need to check whether the value of F is greater than some threshold. However, the classifier returns WORKING on every instance in the test set.
Now my question is: Have I made something wrong, do I need to change some other parts of the code as well for real-valued features to work or is there something wrong with my understanding of linear classifiers?
Your code looks fine. Note that typically with a Maximum Entropy classifier you provide binary valued features (1 or 0).
Here is some more reading on Maximum Entropy classifiers: http://web.stanford.edu/class/cs124/lec/Maximum_Entropy_Classifiers
Look at slide titled: "Feature-Based Linear Classifiers" to see the specific probability calculation for Maximum Entropy classifiers.
Here is the formula for your example case with 1 feature and 2 classes ("works", "broken"):
probability(c1) = exp(w1 * f1) / total
probability(c2) = exp(w2 * f1) / total
total = exp(w1 * f1) + exp(w2 * f1)
w1 is the learned weight for "works" and w2 is the learned weight for "broken"
The classifier selects the higher probability. Note that f1 = (100 or 0.1) your feature value.
If you consider your specific example data, since you have (2 classes, 1 feature, feature is always positive), it is not possible to build a maximum entropy classifier that will separate that data, it will always guess all one way or the other.
For sake of argument say w1 > w2.
Say v > 0 is your feature value (either 100 or 0.1).
Then w1 * v > w2 * v, thus exp(w1 * v) > exp(w2 * v), so you'll always assign more probability to class1 regardless of what value v has.
1.I use IntelliJ IDEA build a maven project,code is as follows:
System.out.println("Load data....");
SentenceIterator iter = new LineSentenceIterator(new File("/home/zs/programs/deeplearning4j-master/dl4j-test-resources/src/main/resources/raw_sentences.txt"));
iter.setPreProcessor(new SentencePreProcessor() {
#Override
return sentence.toLowerCase();
}
});
System.out.println("Build model....");
int batchSize = 1000;
int iterations = 30;
int layerSize = 300;
com.sari.Word2Vec vec= new com.sari.Word2Vec.Builder()
.batchSize(batchSize) //# words per minibatch.
.sampling(1e-5) // negative sampling. drops words out
.minWordFrequency(5) //
.useAdaGrad(false) //
.layerSize(layerSize) // word feature vector size
.iterations(iterations) // # iterations to train
.learningRate(0.025) //
.minLearningRate(1e-2) // learning rate decays wrt # words. floor learning
.negativeSample(10) // sample size 10 words
.iterate(iter) //
.tokenizerFactory(tokenizer)
.build();
vec.fit();
System.out.println("Evaluate model....");
double cosSim = vec.similarity("day" , "night");
System.out.println("Similarity between day and night: "+cosSim);
This code is reference the word2vec in deeplearning4j,but the result is unstable.The results of each experiment were very different.for example, with the cosine value of the similarity between 'day'and 'night', sometimes the result is as high as 0.98, sometimes as low as 0.4?
Here are the results of two experiments
Evaluate model....
Similarity between day and night: 0.706292986869812
Evaluate model....
Similarity between day and night: 0.5550910234451294
Why the result like this.Because I have just started learning word2vec, there are a lot of knowledge is not understood, I hope that seniors can help me,thanks!
You have set the following line:
.minLearningRate(1e-2) // learning rate decays wrt # words. floor learning
But that is an extremely high learning rate. The high learning rate causes the model to not 'settle' in any state, but instead a few updates significantly changes the learned representation. That is not a problem during the first few updates, but bad for convergence.
Solution:
Allow learning rate to decay.
You can leave this line out completely, or if you must you can use a more appropriate value, such as 1e-15
I wanted to use Apache math commons implementation for FFT (FastFourierTransformer class) to process some dummy data whose 8 data samples are contributing to one complete sinusoidal wave. The maximum being amplitude 230. The code snippet that I tried is below :
private double[] transform()
{
double [] input = new double[8];
input[0] = 0.0;
input[1] = 162.6345596729059;
input[2] = 230.0;
input[3] = 162.63455967290594;
input[4] = 2.8166876380389125E-14;
input[5] = -162.6345596729059;
input[6] = -230.0;
input[7] = -162.63455967290597;
double[] tempConversion = new double[input.length];
FastFourierTransformer transformer = new FastFourierTransformer();
try {
Complex[] complx = transformer.transform(input);
for (int i = 0; i < complx.length; i++) {
double rr = (complx[i].getReal());
double ri = (complx[i].getImaginary());
tempConversion[i] = Math.sqrt((rr * rr) + (ri * ri));
}
} catch (IllegalArgumentException e) {
System.out.println(e);
}
return tempConversion;
}
1) Now the data returned by method transform is an array of complex number. Does that array contains the frequency component information about input data? or the tempConversion array that I created will contain the frequency information? The values in tempConversion array is :
2.5483305001488234E-16
920.0
4.0014578493024757E-14
2.2914314707516465E-13
5.658858581079313E-14
2.2914314707516465E-13
4.0014578493024757E-14
920.0
2) I searched a lot but at most of the places there is no clear documentation on what format of data algorithm expects (in terms of sample code to understand better) and how do I use the array of results to calculate the frequencies contained in the signal?
Your output data looks correct. You've calculated the magnitude of the complex FFT output at each frequency bin which corresponds to the energy in the input signal at the corresponding frequency for that bin. Since your input is purely real, the output is complex conjugate symmetric, and the last 3 output values are redundant.
So you have:
Bin Freq Magnitude
0 0 (DC) 2.5483305001488234E-16
1 Fs/8 920.0
2 Fs/4 4.0014578493024757E-14
3 3Fs/8 2.2914314707516465E-13
4 Fs/2 (Nyq) 5.658858581079313E-14
5 3Fs/8 2.2914314707516465E-13 # redundant - mirror image of bin 3
6 Fs/4 4.0014578493024757E-14 # redundant - mirror image of bin 2
7 Fs/8 920.0 # redundant - mirror image of bin 1
All the values are effectively 0 apart from bin 1 (and bin 6) which corresponds to a frequency of Fs/8 as expected.