I'm using metadata-extractor to write a Java application that organizes images and finds duplicates. The API is great, but there's something I cannot figure out.
Suppose I have two JPG images. These images, visually, are exactly the same (i.e. same pixel-wise). However, maybe something within the metadata encapsulated in the file differs.
If I calculate MD5 hashes on each complete file, I will get two different hashes. However, I want to calculate a hash of only the image/pixel data, which would yield the same hash for both files.
So - Is there a way to pull out the raw image/pixel data from the JPG using metadata-extractor so that I can calculate my hash on this?
Also, is Javadoc available for this API? I cannot seem to find it.
You can achieve this using the library's JpegSegmentReader class. It'll let you pull out the JPEG segments that contain image data and ignore metadata segments.
I discussed this technique in another answer and the asker indicated they had success with the approach.
This would actually make a nice sample application for the library. If you come up with something and feel like sharing, please do.
Related
I have an image file, and I need to determine if a specified area of this image contains a signature. Or to put it in end-user terms, "Has this document been signed?"
What I have done so far is to examine all the pixels contained in the area, to calculate an average "darkness", and compare that to a reference value. If the difference in darkness exceeds some threshold, then I consider it signed.
The problem with this (admittedly simplistic) approach is that because the pixels of the signature itself are such a small fraction of area, I have to use a very low darkness threshold, which results in a large number of false positives. I can't distinguish a real signature from stray markings, smudges, fax artifacts, etc.
To be clear...I'm not trying to match any specific signature or set of signatures. That is, I don't care who signed it, only whether it is signed.
Is anyone aware of a Java library that can do this, or of a better approach to this problem than what I am currently doing?
EDIT:
This is an example of the kinds of images I am working with. This document would be faxed to the recipient, signed and faxed back. It won't be this clean-looking by the time I need to look for a signature.
I do not know of any simple solutions. You can wrap over queXF or write something similar in Java. This paper talks about color code algorithm to recognize signatures.
This is what I believe can be done (although not a very good solution) but may still work. It would involve a bit of machine learning. I am assuming that your image does not contain hand written text and its just an image.
First thing to do would be to create a dataset of images which contain a signature and those which do not. The positive samples of the dataset should only contain signatures (you can learn a classifier for multiple aspect ratios) and negative samples should contain random images of the same aspect ratio/dimension. Now, you can compute some feature over these samples (HoG can be used as a feature, although I do not claim it is the best one to use for this application) and learn a SVM for each aspect ratio.
The next step would be to slide a detection window (of different aspect ratios) throughout the image and use the multiple SVMs you have learnt and check if any of them gives a positive response.
Although this approach may not work always, but should give a decent amount of accuracy. The more data you will use to learn, the better the results would get (and if you can come up with a good feature vector to represent a signature, it would help you case even further)
WhatsApp creates duplicate copies of images upon sharing. Although the resolution of the images are same, the MD5 checksum of the original image and it's copy are different. Why is this? How do I get my app to realize that this is a duplicate image.
I've tried MD5 and Sha-1, both of the algorithms generated different checksums for the two images.
Sounds like there's probably a difference in the metadata - e.g. the timestamp might have been changed by the WhatsApp servers when the copy was made.
I suggest you retrieve the pixel data for the images and run your checksums on that. You can use the Bitmap.getPixels() method. e.g.: myBitmap.getPixels(pixels, 0, myBitmap.getWidth(), 0, 0, myBitmap.getWidth(), myBitmap.getHeight());
Remember, just because the checksum is the same that doesn't necessarily mean the images are! If your checksums match, you'll have to do an element-by-element comparison of the data to be 100% sure that the images are identical.
Edit:
There's a good example of how to do a pixel-by-pixel test for equality here. Note you can use the Bitmap.sameAs() method if you're using API 12+!
First of all, sorry for my English and for the length of the message.
I'm writing a simple application in Java for visual cryptography for a school project that takes a schema File and a secret image, then creates n images using the information contained in the schema.
For each pixel in the secret image the application looks for a matrix in the schema file and write m pixels in the n shares (one row for each share).
A schema file contains the matrices (n*m) for every color needed for encoding and it is composed as follows
COLLECTION COLOR 1
START MATRIX 1
RGB
GBR
BGR
END
START MATRIX 2
.....
COLLECTION COLOR 2
START MATRIX 1
XXX
XXX
XXX
END
......
//
This file can be a few lines or many thousands so I can't save the matrices in the application, but I need to always read the file.
To test the performance I created a parser that simply search the matrix looking line by line, but it is very slow.
I thought I'd save the line number of each matrix and then use RandomAccessFile to read it but I wanted to know if there is a more powerful method for doing this.
Thanks
If you are truly dealing with massive, massive input files that exceed your ability to load the entire thing into RAM, then using a persistent key/value store like MapDB may be an easy way to do this. Parse the file once and build of an efficient [Collection+Color]->Matrix map. Store that in a persistent HTree. That'll take care of all of the caching, etc... for you. Make sure to create a good hash function for the Collection+Color tuple, and it should be very performant.
If your data access pattern tends to clump together, it may be faster to store in a B+Tree index - you can play with that and see what works best.
For your schema file, use a FileChannel and call .map() on it. With a little effort, you can calculate the necessary offsets into the mapped representation of the file and use that, or even encapsulate this mapping into a custom structure.
I'm looking for several methods to compare two images to see how similar they are. Currently I plan to have percentages as the 'similarity index' end-result. My program outline is something like this:
User selects 2 images to compare.
With a button, the images are compared using several different methods.
At the end, each method will have a percentage next to it indicating how similar the images are based on that method.
I've done a lot of reading lately and some of the stuff I've read seems to be incredibly complex and advanced and not for someone like me with only about a year's worth of Java experience. So far I've read about:
The Fourier Transform - im finding this rather confusing to implement in Java, but apparently the Java Advanced Imaging API has a class for it. Though I'm not sure how to convert the output to an actual result
SIFT algorithm - seems incredibly complex
Histograms - probably the easiest out of all mentioned so far
Pixel grabbing - seems viable but if theres a considerable amount of variation between the two images it doesn't look like it's going to produce any sort of accurate result. I might be wrong?
I also have the idea of pre-processing an image using a Sobel filter first, then comparing it. Problem is the actual comparing part.
So yeah I'm looking to see if anyone has ideas for comparing images in Java. Hoping that there are people here that have done similar projects before. I just want to get some input on viable comparison techniques that arent too hard to implement in Java.
Thanks in advance
Fourier Transform - This can be used to efficiently can compute the cross-correlation, which will tell you how to align the two images and how similar they are, when they are optimally aligned.
Sift descriptors - These can be used to compare local features. They are often used for correspondence analysis and object recognition. (See also SURF)
Histograms - The normalized cross-correlation often yields good results for comparing images on a global level. But since you are just comparing color distributions you could end up declaring an outdoor scene with lots of snow as similar to an indoor scene with lots of white wallpaper...
Pixel grabbing - No idea what this is...
You can get a good overview from this paper. Another field you might to look into is content based image retrieval (CBIR).
Sorry for not being Java specific. HTH.
As a better alternative to simple pixel grabbing, try SSIM. It does require that your images are essentially of the same object from the same angle, however. It's useful if you're comparing images that have been compressed with different algorithms, for example (e.g. JPEG vs JPEG2000). Also, it's a fairly simple approach that you should be able to implement reasonably quickly to see some results.
I don't know of a Java implementation, but there's a C++ implementation using OpenCV. You could try to re-use that (through something like javacv) or just write it from scratch. The algorithm itself isn't that complicated anyway, so you should be able to implement it directly.
Happy Holidays everyone! I know peoples' brain are switched off right now, but can you please give me solutions or best practices to cache images in java or j2me. I am trying to load images from a server (input stream) and I want to be able to store them so I don't have to retrieve them from the server all the time if they need to be displayed again. Thank you.
The approach you'll want probably depends on the number of images as well as their typical file size. For instance, if you're only likely to use a small number of images or small-sized images, the example provided by trashgod makes a lot of sense.
If you're going to be downloading a very large number of images, or images with very large file sizes, you may consider caching the images to disk first. Then, your application could load and later dispose the images as needed to minimize memory usage. This is the kind of approach used by web browsers.
The simplest approach is to use whatever collection is appropriate to your application and funnel all image access though a method that checks the cache first. In this example, all access is via an image's index, so getImage() manipulates a cache of type List<ImageIcon>. A Map<String, ImageIcon> would be a straightforward alternative.
The only way to do this in J2ME is to save the images' raw byte array (i.e. that you pass to Image.createImage()) to somewhere persistent, possibly a file using JSR75 but more likely a record store using RMS.
HTH