My goal is to implements different image classification methods to show how they function and the advantages and disadvantages behind such methods. The ones I want to try and implement using Java include;
Minimum distance classifier
k-nearest neighbour classifier.
I was wondering what can be used to accomplish my task that already exists in Java so that I can alter the way the algorithms operates.
Although not entirely sure this is what you are looking for (sorry, your question is a bit unclear), if what you want is a library / system to help you with the classification part of the work, then you may want to look at Weka (http://www.cs.waikato.ac.nz/ml/weka/), in my opinion the best Java library for data mining experimentation.
If, instead, you are looking for algorithms that would allow you to analyze images in order to extract features that can, in turn, be used to perform the classification, you may want to start with targeted descriptions of such algorithms in Java, such as those found in the nice on-line book Java Image Processing Cookbook by Rafael Santos; here's a direct link to the section "A Brief Tutorial on Supervised Image Classification".
You can also use RapidMiner with IMMI (IMage MIning) extension:
http://www.burgsys.com/mumi-image-mining-community.php
For image classification you can use for example global feature extraction and then use some classification algorithm (e.g. Artificial Neural Networks).
Related
I have a Java application where I'm looking to determine in real time whether a given piece of text is talking about a topic supplied as a query.
Some techniques I've looked into for this are coreference detection with packages like open-nlp and Stanford-NLP coref detection, but these models take extremely long to load and don't seem practical in a production application environment. Is it possible to perform coreference analysis such that given a piece of text and a topic, I can get a boolean answer that the text is discussing the topic?
Other than document classification which requires a trained corpus, are there any other techniques that can help me achieve such a thing?
I suggest have a look at Weka. It is written in Java so will gel well with your environment, will be faster for your kind of requirement, has lots of tools and comes with a UI as well as API. If you are looking at unsupervised approach (that is one without any learning with pre-classified corpus), here is an interesting paper: http://www.newdesign.aclweb.org/anthology/C/C00/C00-1066.pdf
You can also search for "unsupervised text classification/ information retrieval" on Google. You will get lots of approaches. You can choose the one you find easiest.
for each topic(if they are predefined) you can create list of terms and for each sentence check the cosine similarity of sentence and each topic list and show the most near topic to user
Is it possible to analyse an image and determine the position of a car inside it?
If so, how would you approach this problem?
I'm working with a relatively small data-set (50-100) and most images will look similar to the following examples:
I'm mostly interested in only detecting vertical coordinates, not the actual shape of the car. For example, this is the area I want to highlight as my final output:
You could try OpenCV which has an object detection API. But you would need to "train" it...by supplying it with a large set of images that contained "cars".
http://docs.opencv.org/modules/objdetect/doc/objdetect.html
http://robocv.blogspot.co.uk/2012/02/real-time-object-detection-in-opencv.html
http://blog.davidjbarnes.com/2010/04/opencv-haartraining-object-detection.html
Look at the 2nd link above and it shows an example of detecting and creating a bounding box around the object....you could use that as a basis for what you want to do.
http://www.behance.net/gallery/Vehicle-Detection-Tracking-and-Counting/4057777
Various papers:
http://cbcl.mit.edu/publications/theses/thesis-masters-leung.pdf
http://cseweb.ucsd.edu/classes/wi08/cse190-a/reports/scheung.pdf
Various image databases:
http://cogcomp.cs.illinois.edu/Data/Car/
http://homepages.inf.ed.ac.uk/rbf/CVonline/Imagedbase.htm
http://cbcl.mit.edu/software-datasets/CarData.html
1) Your first and second images have two cars in them.
2) If you only have 50-100 images, I can almost guarantee that classifying them all by hand will be faster than writing or adapting an algorithm to recognize cars and deliver coordinates.
3) If you're determined to do this with computer vision, I'd recommend OpenCV. Tutorial here: http://docs.opencv.org/doc/tutorials/tutorials.html
You can use openCV latentSVM detector to detect the car and plot a bounding box around it:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html
No need to train a new model using HaarCascade, as there is already a trained model for cars:
https://github.com/Itseez/opencv_extra/tree/master/testdata/cv/latentsvmdetector/models_VOC2007
This is a supervised machine learning problem. You will need to use an API that features learning algorithms as colinsmith suggested or do some research and write on of your own. Python is pretty good for machine learning (it's what I use, personally) and has some nice tools like scikit: http://scikit-learn.org/stable/
I'd suggest for you to look into HAAR classifiers. Since you mentioned you have a set of 50-100 images, you can use this to build up a training dataset for the classifier and use it to classify your images.
You can also look into SURF and SIFT algorithms for the specified problem.
I don't know what is the name of this visualization type, but I want to learn how to draw trees like the ones in this image:
I've seen this kind of visualization in many sites but I'm unable to know the technical term behind it.
That graph seems a lot like a force-directed layout. Painting those kind of images is not an easy task, depending on what are you trying to accomplish, you might want to use an existing framwork. If you want to use java you should see at gephi, if you can use an html approach you should definitely take a look at d3.js which is a javascript library for data visualization. They have neat examples: directed-force layout, and collapsible-force layout.
This particular image is done by Stephanie Posavec. You can learn about her design process from an interview she gave the folks at the Data Stories podcast. As far as I remember it, she partially crafts her visualizations by hand, so I'm not sure if you'll ever find an algorithm that does exactly this for you. For different tree layout algorithms, you can refer to treevis.net.
We are interested in doing binary classification of web pages present across the web e.g. Ecommerce vs Non-Ecommerce.
Currently, we are using Mahout library with Naive Bayes algorithm. We are creating training data from existing classified URLs and feature set from the same.
What is the best possible way in terms of accuracy to perform this task?
I need help in terms of algorithm, libraries(usable with JAVA) or any better ideas that help in such types of classification.
Thanks in advance.
The question is quite general so I can add only general information.
The ways to improve the quality of your classification are (in order of importance):
use Lemmatisation and/or Stemming to use only base word forms
implement word filter to remove useless words
train separate classifiers for different languages
You may try to use some existing, well-tuned program,...
CRM411 is designed to be a spam filter, but it is generic enough to do what you want. People use it to sort resume and stuffs. It have lots of engine (HMM, SVM, CLUMP, Bayes, etc..). Give it a try.
This one is a very good demonstration of the algorithm regarding NB classifier.
Discarding most common words would lead to better predictions. IDF can be a good tool for filtering out those words. Also see Wikipedia.
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.