SVM Classifier based on SURF detectors - java

I am a college student, obviously i am newbie in Machine learning so please bear with me.
I am implementing a Java application that would recognize and classify Road/Traffic signs and my major problem is to create and train SVM with SURF descriptors.
I read a lot and came across many different things when it comes to SVM i became even more confused but i will try to clarify what i understood.
FIRST: i know that i must have a dataset that includes Pos images(images that have my objects) and Neg images(images that don't have my objects) to train SVM. I tried to look how it is done in python due to the lack of documentation in Java and came across this code
import numpy as np
dataset = np.loadtxt('./datasetExample.csv', delimiter=",")
And it was simple as that, what is CSV doing here? where is the images of the dataset? i know that the data has to be represented in numbers like inside the CSV file, but where they came from and what it has to do with SVM.
SECOND: I found that in almost all resources SVM can be trained by two ways HOG Descriptors or BagOfWords and didn't find the SURF Descriptor method(ACTUALLY i am not sure if it is possible.. but my Dr. said it can be done).
THIRD: Since i am classifying traffic signs i need to have more than one class (EX. One for Warning signs, one for Regulatory signs, etc..), and each class of course has sub-classes like in the Speed limit signs it includes different types of signs. I came across smth called Multi-Class SVM and i really don't know what is that!!
Currently i managed to extract SURF Descriptors from a given image using this code.
Mat objectImage = Highgui.imread(signObject, Highgui.CV_LOAD_IMAGE_COLOR);
featureDetector.detect(objectImage, objectKeyPoints);
descriptorExtractor.compute(objectImage, objectKeyPoints, objectDescriptors);
datasetObjImage.add(objectImage);
datasetKeyPoints.add(objectKeyPoints);
datasetDescriptors.add(objectDescriptors);
What i was planning to do is to loop over all images of the dataset and extract their descriptors features to train the SVM, but i stucked their since i found the dataset is actually doesn't contain images at all....
So please i would appreciate any sort of help or descriptive steps to achieve that or even good resources i can look at.
Thanks

Related

Pre trained vectors, nlp, word2vec, word embedding for particular topic?

is there any pretrained vector for particular topic only? for example "java", so i want vectors related java in file. mean if i give input inheritance then cosine similarity show me polymorphism and other related stuff only!
i am using corpus as GoogleNews-vectors-negative300.bin and Glove vectors. still not getting related words.
Not sure if I understand your question/problem statement, but if you want to work with a corpus of java source code you can use code2vec which provides pre-trained word-embeddings models. Check it out: https://code2vec.org/
Yes, you can occasionally find other groups' pre-trained vectors for download, which may have better coverage of whatever problem domains they've been trained on: both more specialized words, and word-vectors matching the word sense in that domain.
For example, the GoogleNews word-vectors were trained on news articles circa 2012, so its vector for 'Java' may be dominated by stories of the Java island of Indosnesia as much as the programming language. And many other vector-sets are trained on Wikipedia text, which will be dominated by usages in that particular reference-style of writing. But there could be other sets that better emphasize the word-senses you need.
However, the best approach is often to train your own word-vectors, from a training corpus that closely matches the topics/documents you are concerned about. Then, the word-vectors are well-tuned to your domain-of-concern. As long as you have "enough" varied examples of a word used in context, the resulting vector will likely be better than generic vectors from someone else's corpus. ("Enough" has no firm definition, but is usually at least 5, and ideally dozens to hundreds, of representative, diverse uses.)
Let's consider your example goal – showing some similarity between the ideas of 'polymorphism' and 'input inheritance'. For that, you'd need a training corpus that discusses those concepts, ideally many times, from many authors, in many problem-contexts. (Textbooks, online articles, and Stack Overflow pages might be possible sources.)
You'd further need a tokenization strategy that manages to create a single word-token for the two-word concept 'input_inheritance' - which is a separate challenge, and might be tackled via (1) a hand-crafted glossary of multi-word-phrases that should be combined; (2) statistical analysis of word-pairs that seem to occur so often together, they should be combined; (3) more sophisticated grammar-aware phrase- and entity-detection preprocessing.
(The multiword phrases in the GoogleNews set were created via a statistical algorithm which is also available in the gensim Python library as the Phrases class. But, the exact parameters Google used have not, as far as I know, been revealed.And, good results from this algorithm can require a lot of data and tuning, and still result in some combinations that a person would consider nonsense, and missing others that a person would consider natural.)

How to combine two pre-trained Word2Vec models?

I successfully followed deeplearning4j.org tutorial on Word2Vec, so I am able to load already trained model or train a new one based on some raw text (more specifically, I am using GoogleNews-vectors-negative300 and Emoji2Vec pre-trained model).
However, I would like to combine these two above models for the following reason: Having a sentence (for example, a comment from Instagram or Twitter, which consists of emoji), I want to identify the emoji in the sentence and then map it to the word it is related to. In order to do that, I was planning to iterate over all the words in the sentence and calculate the closeness (how near the emoji and the word are located in the vector space).
I found the code how to uptrain the already existing model. However, it is mentioned that new words are not added in this case and only weights for the existing words will be updated based on a new text corpus.
I would appreciate any help or ideas on the problem I have. Thanks in advance!
Combining two models trained from different corpuses is not a simple, supported operation in the word2vec libraries with which I'm most familiar.
In particular, even if the same word appears in both corpuses, and even in similar contexts, the randomization that's used by this algorithm during initialization and training, and extra randomization injected by multithreaded training, mean that word may appear in wildly different places. It's only the relative distances/orientation with respect to other words that should be roughly similar – not the specific coordinates/rotations.
So to merge two models requires translating one's coordinates to the other. That in itself will typically involve learning-a-projection from one space to the other, then moving unique words from a source space to the surviving space. I don't know if DL4J has a built-in routine for this; the Python gensim library has a TranslationMatrix example class in recent versions which can do this, as motivated by the use of word-vectors for language-to-language translations.

How to determine the position of a car inside an image?

Is it possible to analyse an image and determine the position of a car inside it?
If so, how would you approach this problem?
I'm working with a relatively small data-set (50-100) and most images will look similar to the following examples:
I'm mostly interested in only detecting vertical coordinates, not the actual shape of the car. For example, this is the area I want to highlight as my final output:
You could try OpenCV which has an object detection API. But you would need to "train" it...by supplying it with a large set of images that contained "cars".
http://docs.opencv.org/modules/objdetect/doc/objdetect.html
http://robocv.blogspot.co.uk/2012/02/real-time-object-detection-in-opencv.html
http://blog.davidjbarnes.com/2010/04/opencv-haartraining-object-detection.html
Look at the 2nd link above and it shows an example of detecting and creating a bounding box around the object....you could use that as a basis for what you want to do.
http://www.behance.net/gallery/Vehicle-Detection-Tracking-and-Counting/4057777
Various papers:
http://cbcl.mit.edu/publications/theses/thesis-masters-leung.pdf
http://cseweb.ucsd.edu/classes/wi08/cse190-a/reports/scheung.pdf
Various image databases:
http://cogcomp.cs.illinois.edu/Data/Car/
http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
http://cbcl.mit.edu/software-datasets/CarData.html
1) Your first and second images have two cars in them.
2) If you only have 50-100 images, I can almost guarantee that classifying them all by hand will be faster than writing or adapting an algorithm to recognize cars and deliver coordinates.
3) If you're determined to do this with computer vision, I'd recommend OpenCV. Tutorial here: http://docs.opencv.org/doc/tutorials/tutorials.html
You can use openCV latentSVM detector to detect the car and plot a bounding box around it:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
No need to train a new model using HaarCascade, as there is already a trained model for cars:
https://github.com/Itseez/opencv_extra/tree/master/testdata/cv/latentsvmdetector/models_VOC2007
This is a supervised machine learning problem. You will need to use an API that features learning algorithms as colinsmith suggested or do some research and write on of your own. Python is pretty good for machine learning (it's what I use, personally) and has some nice tools like scikit: http://scikit-learn.org/stable/
I'd suggest for you to look into HAAR classifiers. Since you mentioned you have a set of 50-100 images, you can use this to build up a training dataset for the classifier and use it to classify your images.
You can also look into SURF and SIFT algorithms for the specified problem.

Is it possible to automatically generate descriptors from a data set for use with ANN?

I would like to classify a dataset automatically into several classes. Is it possible to train a neural net without coding any descriptors?
I am classifying a set of fixed size Pictures. I do not really want to write a set of descriptors for them, though. Is there a way where I can classify my set with little effort?
I have a large dataset and only 7-8 classes in which to classify.
I would be extremely happy if I could snag some sample code along the way :)
There is a very broad class of neural networks. For what you're doing, you will want to look for one based on unsupervised learning. Typically, these are based on Hebb's rule.
For your case, you might find competitive learning to be suitable. Essentially, you set an output neuron for each class (so the 7-8 you're expecting), and strengthen the weights to the most active for a given input pattern. This results in clustering; input patterns that are similar activate the same output neuron, strengthening those connections and causing the neurons to specialize for the different classes.

Java speech recognition like androids

Im looking for a speech recognition software for java that acts more like the android version, in that, instead of having .gram files and stuff, it just returns a string of what was said, and I can act on it. Ive tried using sphinx-4, but using .gram files makes my program a lot harder to do.
The point of a grammar file is to improve the accuracy of what you're getting back. Instead of trying to come up with random strings of english words, you tell it to expect specific input.
That said, sphinx-4 can do ordinary large-dictionary ASR as well. Read the N-Gram part of this tutorial and look at the Transcriber sample that comes with the sphinx source code.
In addition, you can train your own trigram model that will enhance the results you get. (E.g., place more probability on the word "weather" being detected.) This is certainly what Siri does. Apple/Google have a huge corpus of pieces of audio that people speak into their phones, part of which is human transcribed, from which they train both acoustic and linguistic models (so their engines detect things people typically say instead of nonsense).

Categories

Resources