What is the meaning of jitter in visualize tab of weka - java

In weka I load an arff file. I can view the relationship between attributes using the visualize tab.
However I can't understand the meaning of the jitter slider. What is its purpose?

You can find the answer in the mailing list archives:
The jitter function in the Visualize panel just adds artificial random
noise to the coordinates of the plotted points in order to spread the
data out a bit (so that you can see points that might have been
obscured by others).

I don't know weka, but generally jitter is a term for the variation of a periodic signal to some reference interval. I'm guessing the slider allows you to set some range or threshold below which data points are treated as being regular, or to modify the output to introduce some variation. The wikipedia entry can give you some background.
Update: from this pdf, the jitter slider is for this purpose:
“Jitter” option to deal with nominal attributes (and to detect “hidden”data points)
Based on the accompanying slide it looks like it introduces some variation in the visualisation, perhaps to show when two data points overlap.
Update 2: This google books extract (to Data mining By Ian H. Witten, Eibe Frank) seems to confirm my guess:
[jitter] is a random displacement applied to X and Y values to separate points that lie on top of one another. Without jitter, 1000 instances at the same data point would look just the same as 1 instance

I don't know the products you mention, but jittering generally means randomising the sample positions. Eg, in ray tracing you would normally render a ray though each pixel on the screen. Jittering adds a random offset to each ray to reduce issues caused by regular aliasing.

Related

Need Conceptual Help Rendering a Heat Map

I need to create a heatmap for android google maps. I have geolocation and points that have negative and positive weight attributed to them that I would like to visually represent. Unlike the majority of heatmaps, I want these positive and negative weights to destructively interfere; that is, when two points are close to each other and one is positive and the other is negative, the overlap of them destructively interferes, effectively not rendering areas that cancel out completely.
I plan on using the android google map's TileOverlay/TileProvider class that has the job of creating/rendering tiles based a given location and zoom. (I don't have an issue with this part.)
How should I go about rendering these Tiles? I plan on using java's Graphics class but the best that I can think of is going through each pixel, calculating what color it should be based on the surrounding data points, and rendering that pixel. This seems very inefficient, however, and I was looking for suggestions on a better approach.
Edit: I've considered everything from using a non-android Google Map inside of a WebView to using a TileOverlay to using a GroundOverlay. What I am now considering doing is having a large 2 dimensional array of "squares." Each square would have a long, lat, and total +/- weights. When a new data point is added, instead of rendering it exactly where it is, it will be added to the "square" that it is in. The weight of this data point will be added to the square and then I will use the GoogleMap Polygon object to render the square on the map. The ratio of +points to -points will determine the color that is rendered, with a ratio closer to 1:1 being clear, >1 being blue (cold point), and <1 being red (hot point).
Edit: a.k.a. clustering the data into small regional groups
I suggest trying
going through each pixel, calculating what color it should be based on the surrounding data points, and rendering that pixel.
Even if it slow, it will work. There are not too many Tiles on the screen, there are not too many pixels in each Tile and all this is done on a background thread.
All this is still followed by translating Bitmap into byte[]. The byte[] is a representation of PNG or JPG file, so it's not a simple pixel mapping from Bitmap. The last operation takes some time too and may possibly require more processing power than your whole algorithm.
Edit (moved from comment):
What you describe in the edit sounds like a simple clustering on LatLng. I can't say it's a better or worse idea, but it's something worth a try.

Find and Crop relevant image area automatically

We are trying to crop the relevant area of an image (photo) with a square aspect ratio (1:1), similar to what Facebook does when creating thumbnails.
In our case, it doesn't really matter if the crop has the original height (or width when the image orientation is portrait h>w) of the image to be processed or the auto-crop is resizing itself as well
I am thinking of algorithms like comparing objects with background or focus or something like a heat-map, combining colors and/or areas to find the most relevant part. There could be several ideas/methods to find the main part of the image to be used, similar to face detection.
We are looking for a Java (Android)-based solution or anything that can be adopted for Java / Android. Any help or idea would be greatly appreciated! Thank you!
I would do this in two steps, where the initial step is more robust and the second could be based on, for example, entropy. For the first step, you can use SURF which is relatively common nowadays and I would expect to find Java implementations of it. SURF will give a set of key points that it considers important to describe your image. Considering where these key points are in your image, you have a set of (x, y) coordinates from which you use to reduce the area of your initial image to that which encloses this set of points. Now, since these key points might be anywhere in your image, you will probably want to discard some of them (i.e., those that are too far from the others -- outliers). A very simple way to do this discarding step is considering the convex hull from the initial set of key points, from there, you can peel this hull multiple times. Each time you "peel" it, you are effectively discarding the points in the current convex hull.
Here is a sample for such first step:
f = Import["http://fohn.net/duck-pictures-facts/mallard-duck.jpg"];
kp = ImageKeypoints[f, MaxFeatures -> 200];
Show[f, Graphics[{PointSize[Medium], Red, Point[kp]}]]
After peeling once the convex hull formed by the key points and trimming the image according to the bounding rectangle of the remaining points:
From the image above, you can decide which sub-region of it to pick based on some other method. One that is apparently common is the one used by Reddit, which successively remove slices of lesser entropy from the image. Quickly searching for it, I found one such implementation at https://github.com/christopherhan/pycrop/blob/master/pycrop.py#L33, it is very simple.
Another different kind of method that you might wanna try is called Seam-Carving. Also note that depending on how large is the initial image, it is unlikely that cropping a small piece of it will give anything relevant. In those cases, it is more interesting to first resize the image and then apply the relevant methods.

find perceptual similarity between two images in java

I'm looking for an algorithm that can find perceptual similarity between two images, actually i want to input one picture into system and it search whole my database which contain huge amount of picture and then retrieve the images which have more perceptual similarity with source image, could any body please help me ?
I mean i want to find similar pic. i heard some algorithm can find similar pictures base on the source pic's shape, color and etc (pixel by pixel). i wanna have the system that i input the source image and system retrieve the similar images based on perceptual features like shape, color, size and etc.
Thanks
You need to define carefully what 'perceptually similar' means to you, before trying to find a measurable entity that captures that. Imagine a picture of of a grass field under a blue sky with a horse. Should your application retrieve all horse pictures? Or all pictures with green grass and a blue sky? In the latter case, the above mentioned color histograms are a good start. Alternatively you could look at gaussian mixture models (GMM), they are used quite a bit in retrieval. This code could be a starting point and this article Image retrieval using color histograms
generated by Gauss mixture vector quantization
More complicated is the so called "bag of words" or "visual words" approach. It is increasingly used for image categorization and identification. This algorithm usually starts by detecting robust points in an image, meaning that these points will survive certain image distortions. Example popular algorithms are SIFT and SURF. The region around these found points is captured with a descriptor, which could for example be a smart histogram.
In the most simple form, one can collect all data from all descriptors from all images and cluster them, for example using k-means. Every original image then has descriptors that contribute to a number of clusters. The centroids of these clusters, i.e. the visual words, can be used as a new descriptor for the image. The VLfeat website contains a nice demo of this approach, classifying the caltech 101 dataset. Also noteworthy, are results and software from Caltech itself.
One simple way to start is comparing the Color Histogram.
But the following article proposes the use of Joint Histogram instead. You may also take a look.
http://www.cs.cornell.edu/rdz/joint-histograms.html

FFT Image to Measure Similarities

Ok I'm writing a small Java app that accepts two images as inputs, compares them, then gives a quantitative output as a measure of similarity (eg. 50% similar).
To my understanding FFT is a good way to measure similarity of two images. But I can't for the love of god figure out how to code/implement it.
So far I've implemented another function which basically gives me two histograms (one for each image). All I need now is to write a method that will FFT an image and give me a quantifiable outcome.
Can anyone help me out with this? I'd really like to see some sample codes, if not at least a point in the right direction. Much thanks in advance.
Similarity is not an exact term. For example: if you have circle, and an ellipse are they similar? They are both round objects, so in this sense they are - but if we want to filter out circles only they are not. You will have to define a measure (or measures - for example roundness, intensity distribution, size, orientation, number of objects, euler number, etc.), than calculate it for each image. The similarity of the two images will be (some kind of) distance between the two calculated values. This could be euclidean distance (for two real measures), or some kind of error function (RMS for intensity distributions).
You will have to choose to which transforms should your measure stay invariant (is the rotated image similar to the original? If yes, simple fourier transform is not appropriate).
Measuring similarity of an image is hard, if you have to do that I would read about image stitching. If you just need to distinguish BLOB-s, first try to calculate some simple measures (I recommend calculating moments - area, orientation; read K-means clusteing), or 1D fourier transform of the distance of the contour from the center of the mass (whic is a little bit more difficult).
Before you attempt to code up a 2DFT, you should fully understand the math behind it. flolo is correct that you can compute it by first doing a 1D FFT on the rows and columns and then combining the results, but I have no reason to believe the L_inf norm is the best way to convert them to a metric, since it completely skips the usual combining step to create the full 2DFT. Take a look at http://fourier.eng.hmc.edu/e101/lectures/Image_Processing/node6.html at the very bottom of the page.
That said, there may be better ways to compare images that don't require comparing 2D arrays of information. For instance, PCA (Principal Component Analysis, which is just a matter of running SVD {Singular Value Decomposition} on your images after mean-centering them, though I'd take a look at the wikipedia article on it first) will give you a 1D vector which you could then apply some L_p norm to directly to compare, although in this case, i would use something like sum(min(a_i/b_i , b_i/a_i))/length(a), where a and b are the 1D vectors you got from the transform.
There are many good sites with code for a fft on an 1-D array of values. You just apply this fft row by row on your image. And afterwards you do fft columnwise on the results.
Now you need a metric to get from the resulting transformed image, my suggestion would be to try the max-norm (L_inf). That is max_{x,y}{fft2d(imag1)[x,y] - fft2d(imag2)[x,y]}.
If you just want to check if it is likely that one image is a quick edit of another for something like DRM of stock photography then check the percentages of a normalized color palette within probable regions. If they match within an THRESHOLD for a NUMBER_OF_TEST_COLORS in any one of a number of TEST_REGIONS within the image then you have a "suspect"... you still need a human to check the suspects. But this is a quick and dirty way to find many of the image re-sizers, horiz/vert flippers, and background color changers, file format changers, and other subtle variations... of course "normalizing the colors" to a quantized palette is an art unto itself. I would recommend quantizing images into nearest "web safe" colors for practicality.
I'm a blue collar garbage man in comparison to a mathematician, but garbage men are quite practical! I have had good success with this kind of approach in grouping similar images and search by color applications.

Algorithm to determine which points should be visible on a map based on zoom

I'm making a Google Maps-like application for a course at my Uni (not something complex, it should load the map of a city for example, not the whole world). The map can have many layers, including markers (restaurants, hospitals, etc.)
The problem is that when you have many points and you zoom out the map it doesn't look right. At this zoom level only some points need to be visible (and at the maximum map size, all points).
The question is: how can you determine which points should be visible for a specified zoom level?
Because I have implemented a PR Quadtree to speed up rendering I thought that I could define some "high-priority" markers (that are always visible, defined in the map editor) and put them in a queue. At each step a marker is removed from the queue and all it's neighbors that are at least D units away (D depends on the zoom levels) are chosen and inserted in the queue, and so on.
Is there any better way than the algorithm I thought of?
Thanks in advance!
I had a similar problem and you cannot avoid having overlapped icons regardless the method to mark some icons as high-priority.
What I did(it might not be easy to apply in your case - in my case the map was rendered in a desktop application, with much more control above the rendering process) was to sort based on priority, and paint only markers which are not overlapping - showing also a message like "XXX overlapping markers removed". This way the user doesn't become overwhelmed with information and he still sees the most important information.
I hope this helps.
Not sure I understand completely, but perhaps you can assign each layer a "neighborhood density," based on the inverse of the average distance from each point to its closest neighbor. For a particular zoom level, you would calculate a maximum density that is comfortably viewable and use that as a threshold.
I have some experience in designing a map application from scratch I would recommend you to split the entire world into 16 zoom levels. Zoom Level 0 should show the entire world and Zoom level 15 should cover the street data.
Typically, you would have to use the zoom levels 0 to 3 to have boundaries of the countries. And each zoom level should have a zoom range of 1/4th of the previous zoom range. You can have a zoom to table map (assuming you are using a database to store the spatial related data). Once you define the zoom levels and the range of the zoom levels to table mapping you can have finer control on querying the data. And its recommended to construct a R-Tree indexing for your map data.
Each and everytime you get a layer/table (assuming a layer can be either a country boundary or a railway track or a street), its advisable to define a zoom level by yourself instead of spending time in finding an algorithm since the number of layers are not going to be huge.
I can keep going, but if you want specific answers, I can address them as well.

Categories

Resources