I have seen questions related to this post but it is little bit confusing.
I have gone through kaggle site
Here we need to distingush between Dog and cat (Looking into training only).
I want to apply my svm java implementation in these "train" data.
How to do that. My Normal svm takes only numeric values with 1/-1 classification.
Here it is images.
So i need to first convert the images into numerical data?
How will be the flow : convert to numeric data then do svm then what will be final result
Where can i find a large file (1GB) for training svm.Only numeric value not images.
I dont want to use libraries for that like libsvm. I need to do only training.
Any suggestion.
Related
I am a college student, obviously i am newbie in Machine learning so please bear with me.
I am implementing a Java application that would recognize and classify Road/Traffic signs and my major problem is to create and train SVM with SURF descriptors.
I read a lot and came across many different things when it comes to SVM i became even more confused but i will try to clarify what i understood.
FIRST: i know that i must have a dataset that includes Pos images(images that have my objects) and Neg images(images that don't have my objects) to train SVM. I tried to look how it is done in python due to the lack of documentation in Java and came across this code
import numpy as np
dataset = np.loadtxt('./datasetExample.csv', delimiter=",")
And it was simple as that, what is CSV doing here? where is the images of the dataset? i know that the data has to be represented in numbers like inside the CSV file, but where they came from and what it has to do with SVM.
SECOND: I found that in almost all resources SVM can be trained by two ways HOG Descriptors or BagOfWords and didn't find the SURF Descriptor method(ACTUALLY i am not sure if it is possible.. but my Dr. said it can be done).
THIRD: Since i am classifying traffic signs i need to have more than one class (EX. One for Warning signs, one for Regulatory signs, etc..), and each class of course has sub-classes like in the Speed limit signs it includes different types of signs. I came across smth called Multi-Class SVM and i really don't know what is that!!
Currently i managed to extract SURF Descriptors from a given image using this code.
Mat objectImage = Highgui.imread(signObject, Highgui.CV_LOAD_IMAGE_COLOR);
featureDetector.detect(objectImage, objectKeyPoints);
descriptorExtractor.compute(objectImage, objectKeyPoints, objectDescriptors);
datasetObjImage.add(objectImage);
datasetKeyPoints.add(objectKeyPoints);
datasetDescriptors.add(objectDescriptors);
What i was planning to do is to loop over all images of the dataset and extract their descriptors features to train the SVM, but i stucked their since i found the dataset is actually doesn't contain images at all....
So please i would appreciate any sort of help or descriptive steps to achieve that or even good resources i can look at.
Thanks
The classification using the libsvm is always wrong and it never changes the predicted label.(ex. I have 7 emotions, when i try to predict an image from outside the dataset it gives me 4. which is happy emotion, I tried an image from the dataset and the same label is the result)
I extratced the image features using the gabor filter with orientation 6 and scale 4.
I used a script grid.py to find the optimal values for cost and gamma
Finally i used the parameters in the last step in training and get model
./svm-train -c 8 -g 0.03214 svm.train model.model
I tried to change the kernal function and svm-type but it's still the same problem.
Is there any relation between number of features i use in training and number of images in dataset?
Note:I used the japanese women facial expression dataset.
I don't think you want to use SVM for image classification. The task you describe (detecting emotions on image) require you to feed extremely good features to your SVM to learn from, gabor filter won't do this.
I suggest you to try a deep learning approach - for example you may wish to try a convolutional neural network for this. These thing is able to extract features out of raw image and then use them to classify the images.
Check this out:
http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
Here they use DNN to find the facial key points on image (i.e. location of eyes, nose tip etc.). You may want to adjust the code a little so that it just classifies your images.
Once again, the power of DNN is that it acts both as feature extractor and as a classifier. It is an incredibly powerful tool for image recognition tasks, unlike the SWM of any type.
I am trying to classify legal case documents which are in text format, in different folders like Civil, Land, Criminal, e.t.c, I intended using Naive Bayes as Vectoriser to get the vectors from the text documents,feed it in to SVM to classify the documents using javaml, I have implemented the preprocessing like stemming, I used the formulars of Naive Bayes as seen in http://eprints.nottingham.ac.uk/2995/1/Isa_Text.pdf to calculate prior probability, likelihood, evidence and posterior probability, I am assuming the posterior probability is the vector to be fed into SVM, but I cannot format the output to feed into the SVM library.
I need all the help I can get in this, I hope I am doing things right.
I have other legal cases as test set that I want to classify to the right categories.
I have a numerical dataset of the format class, unigram count, bigram count, sentiment. I went through some of the Apache Mahout documentation and it was all about text data. I am aware that I need to perform 3 steps to classify: Convert to sequence files, Vectorize sequence files, Pass it to train the Naive Bayes Classifier. But I am having a hard time to understand the difference between classifying a text dataset vs classifying a numerical dataset in Mahout. What do I need to do differently in my case? I would appreciate any help.
As you might know, mahout can not use text data to train a model. If you start from a numerical dataset, the classification will be even easier because the vectors that mahout handle are numerical data vectors.
I used mahout on a text dataset and I know that in that case, I had to use dictionnary to convert text data to numerical data. Some algorithms handle it better than others ( for example Naive Bayes strongly prefers text-like data).
So in your case, try to use other classifiers like random forrest or online logistic regression to obtain more efficient result. In my experience, using random forrest, you can just define the type of features that you have (in your case all your features are numerical) so the classification could be done pretty easily. If you want to stick with Naive Bayes, I am sure it is still possible to classify your numerical dataset but I never used it so I can not give more help.
I am doing project on question classification using SVM. Given a question, the system must be able to allocate the class to question. For eg for the question “Where is Tajmahal located?” the task of question classification is to assign label “Location” to this question, since the answer to this question is a named entity of type “Location”. So I know that first i have to provide training dataset from which the model will learn and then testing dataset.
My training dataset contains class, index and value(in terms of question) all categorical.
class Index value(question)
DESC manner How did serfdom develop in and then leave Russia ?
Likewise i have 6 classes 50 index and 1000 question
SVM takes as input numerical values
For implementing SVM i have downloaded LIBSVM. Libsvm is a libraray for SVM
www.csie.ntu.edu.tw/~cjlin/libsvm/
I dont know how should i convert this data to Libsvm format. Please help