Java compare two audio files with fingerprint

Java compare two audio files with fingerprint - java

I want find out, if two audio files are same or one contains the other.
For this I use Fingerprint of musicg
byte[] firstAudio = readAudioFileData("first.mp3");
byte[] secondAudio = readAudioFileData("second.mp3");
FingerprintSimilarityComputer fingerprint =
new FingerprintSimilarityComputer(firstAudio, secondAudio);
FingerprintSimilarity fingerprintSimilarity = fingerprint.getFingerprintsSimilarity();
System.out.println("clip is found at " + fingerprintSimilarity.getScore());
to convert audio to byte array I use sound API
public static byte[] readAudioFileData(final String filePath) {
byte[] data = null;
try {
final ByteArrayOutputStream baout = new ByteArrayOutputStream();
final File file = new File(filePath);
final AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(file);
byte[] buffer = new byte[4096];
int c;
while ((c = audioInputStream.read(buffer, 0, buffer.length)) != -1) {
baout.write(buffer, 0, c);
}
audioInputStream.close();
baout.close();
data = baout.toByteArray();
} catch (Exception e) {
e.printStackTrace();
}
return data;
}
but when I execute it, I became at fingerprint.getFingerprintsSimilarity() an Exception.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 15999
at com.musicg.fingerprint.PairManager.getPairPositionList(PairManager.java:133)
at com.musicg.fingerprint.PairManager.getPair_PositionList_Table(PairManager.java:80)
at com.musicg.fingerprint.FingerprintSimilarityComputer.getFingerprintsSimilarity(FingerprintSimilarityComputer.java:71)
at Main.main(Main.java:42)
How can I compare 2 mp3 files with fingerprint in Java?

I never did any audio stuff in Java before, but I looked into your code briefly. I think that musicg only works for WAV files, not for MP3. Thus, you need to convert the files first. A web search reveals that you can e.g. use JLayer for that purpose. The corresponding code looks like this:
package de.scrum_master.so;
import com.musicg.fingerprint.FingerprintManager;
import com.musicg.fingerprint.FingerprintSimilarity;
import com.musicg.fingerprint.FingerprintSimilarityComputer;
import com.musicg.wave.Wave;
import javazoom.jl.converter.Converter;
import javazoom.jl.decoder.JavaLayerException;
public class Application {
public static void main(String[] args) throws JavaLayerException {
// MP3 to WAV
new Converter().convert("White Wedding.mp3", "White Wedding.wav");
new Converter().convert("Poison.mp3", "Poison.wav");
// Fingerprint from WAV
byte[] firstFingerPrint = new FingerprintManager().extractFingerprint(new Wave("White Wedding.wav"));
byte[] secondFingerPrint = new FingerprintManager().extractFingerprint(new Wave("Poison.wav"));
// Compare fingerprints
FingerprintSimilarity fingerprintSimilarity = new FingerprintSimilarityComputer(firstFingerPrint, secondFingerPrint).getFingerprintsSimilarity();
System.out.println("Similarity score = " + fingerprintSimilarity.getScore());
}
}
Of course you should make sure that you do not convert each file again whenever the program starts, i.e. you should check if the WAV files already exist. I skipped this step and reduced the sample code to a minimal working version.

For FingerprintSimilarityComputer(input1, input2), it suppose to take in the fingerprint of the loaded audio data and not the loaded audio data itself.
In your case, it should be:
// Convert your audio to wav using FFMpeg
Wave w1 = new Wave("first.wav");
Wave w2 = new Wave("second.wav");
FingerprintSimilarityComputer fingerprint =
new FingerprintSimilarityComputer(w1.getFingerprint(), w2.getFingerprint());
// print fingerprint.getFingerprintSimilarity()

Maybe I am missing a point, but if I understood you right, this should do:
byte[] firstAudio = readAudioFileData("first.mp3");
byte[] secondAudio = readAudioFileData("second.mp3");
byte[] smaller = firstAudio.length <= secondAudio.lenght ? firstAudio : secondAudio;
byte[] bigger = firstAudio.length > secondAudio.length ? firstAudio : secondAudio;
int ixS = 0;
int ixB = 0;
boolean contians = false;
for (; ixB<bigger.length; ixB++) {
if (smaller[ixS] == bigger[ixB]) {
ixS++;
if (ixS == smaller.lenght) {
contains = true;
break;
}
}
else {
ixS = 0;
}
}
if (contains) {
if (smaller.length == bigger.length) {
System.out.println("Both tracks are equal");
}
else {
System.out.println("The bigger track, fully contains the smaller track starting at byte: "+(ixB-smaller.lenght));
}
}
else {
System.out.println("No track completely contains the other track");
}

Related

Error of FFmpeg on Java in "av_image_copy_to_buffer" method during decoding H.264 stream

I'm trying to decode H.264 stream, which is sent over Socket from an Android application to a computer. And I also want to show the decoded stream using JavaFX. I searched for a long time, and decided to use JavaCV / FFmpeg. However I got error from FFmpeg. (I was inspired by this code.)
Questions:
Why does FFmpeg make error?
Is it a correct way to convert AVFrame to javafx.scene.image.Image?
I'm using:
javacv-platform 1.4.4
ffmpeg-platform 4.1-1.4.4
Code:
This is a part of import and class fields, and method which runs once at the first time. (Actually the content of initialize() is wrapped by try~catch.)
import javafx.scene.image.Image;
private avcodec.AVCodec avCodec;
private avcodec.AVCodecContext avCodecContext;
private avutil.AVDictionary avDictionary;
private avutil.AVFrame avFrame;
public void initialize() {
avCodec = avcodec_find_decoder(AV_CODEC_ID_H264);
if (avCodec == null) {
throw new RuntimeException("Can't find decoder");
}
avCodecContext = avcodec_alloc_context3(avCodec);
if (avCodecContext == null) {
throw new RuntimeException("Can't allocate decoder context");
}
int result = avcodec_open2(avCodecContext, avCodec, (AVDictionary) null);
if (result < 0) {
throw new RuntimeException("Can't open decoder");
}
avFrame = av_frame_alloc();
if (avFrame == null) {
throw new RuntimeException("Can't allocate frame");
}
}
And this is a method which is called every time when I receive a packet from Android. byte[] data is the packet data starting with 0x00, 0x00, 0x00, 0x01.
The place where I get error is number_of_written_bytes. It always gets <0.
private void decode(byte[] data) {
AVPacket avPacket = new AVPacket();
av_init_packet(avPacket);
avPacket.pts(AV_NOPTS_VALUE);
avPacket.dts(AV_NOPTS_VALUE);
BytePointer bytePointer = new BytePointer(data);
bytePointer.capacity(data.length);
avPacket.data(bytePointer);
avPacket.size(data.length);
avPacket.pos(-1);
avcodec_send_packet(avCodecContext, avPacket);
int result = avcodec_receive_frame(avCodecContext, avFrame);
if (result >= 0) {
int bufferOutputSize = av_image_get_buffer_size(avFrame.format(), avFrame.width(), avFrame.height(), 16);
Pointer pointer = av_malloc(bufferOutputSize);
BytePointer outputPointer = new BytePointer(pointer);
int number_of_written_bytes = av_image_copy_to_buffer(outputPointer, bufferOutputSize, avFrame.data(), avFrame.linesize(), avFrame.chroma_location(), avFrame.width(), avFrame.height(), 1);
if (number_of_written_bytes < 0) {
//The process always come here.
throw new RuntimeException("Can't copy image to buffer");
}
System.out.println("decode success");
Image image = new Image(new ByteArrayInputStream(outputPointer.asBuffer().array()));
} else {
System.out.println("decode failed");
}
}
Anything is helpful for me. Thanks.

huffman code - cant decompress BMP files using bitset

I built a classic Hoffman code, with encoder and decoder. I noticed that I had a problem, I use code in "bitset", to compress the input file. But the "bitset" - does not decode all the files I send to, for example when I send a txt file, it works great, but when I send other files like BMP. It doesn't work.
Before I used bitset - the code worked - but without any compression - so I'm afraid the problem is with bitset.
The decoder I built is:
public void Decompress(String[] input_names, String[] output_names) {
HuffmanVerticle tree = new HuffmanVerticle();
tree = readTreeFile(output_names);
restoreInput(tree, output_names, input_names);
}
public static void restoreInput(HuffmanVerticle tree, String[] binary_names, String[] original_names) {
BitSet huffmanCodeBit;
try {
FileOutputStream to_original = new FileOutputStream(original_names[0]);
FileInputStream binary = new FileInputStream(binary_names[0]);
ObjectInputStream s = new ObjectInputStream(binary);
huffmanCodeBit = (BitSet) s.readObject();
System.out.println(huffmanCodeBit.toString());
int index = 0;
while(huffmanCodeBit.length() > index)
{
HuffmanVerticle tmp = tree;
while (!tmp.isNullTree())
{
boolean bit = huffmanCodeBit.get(index);
index++;
System.out.println(bit);
if (!bit)
tmp = tmp.left;
else
tmp = tmp.right;
}
to_original.write(tmp.character);
}
binary.close();
to_original.close();
} catch (Exception e) {
e.printStackTrace();
}
}
What am I missing here? Why doesn't the code work for certain files? I'm trying to run the code on some files but it doesn't work, the files that come back don't work.
The code does not work for bmp files at all, even after half an hour, for example txt files, it runs very fast.
Thank for your help.

Image comparaison performance java

i have this code below, but it is not efficient at all, it is very very slow and more pictures i have to compare more long time it takes.
For example i have 500 pictures, each process lasts 2 minutes, 500 x 2 min =1000 min !
the specificity is as soon as there is picture same as compared, move it to another folder. then retrieve the rest files to compare i++
any idea ?
public static void main(String[] args) throws IOException {
String PicturesFolderPath=null;
String removedFolderPath=null;
String pictureExtension=null;
if(args.length>0) {
PicturesFolderPath=args[0];
removedFolderPath=args[1];
pictureExtension=args[2];
}
if(StringUtils.isBlank(pictureExtension)) {
pictureExtension="jpg";
}
if(StringUtils.isBlank(removedFolderPath)) {
removedFolderPath=Paths.get(".").toAbsolutePath().normalize().toString()+"/removed";
}
if(StringUtils.isBlank(PicturesFolderPath)) {
PicturesFolderPath=Paths.get(".").toAbsolutePath().normalize().toString();
}
System.out.println("path to find pictures folder "+PicturesFolderPath);
System.out.println("path to find removed pictures folder "+removedFolderPath);
Collection<File> fileList = FileUtils.listFiles(new File(PicturesFolderPath), new String[] { pictureExtension }, false);
System.out.println("there is "+fileList.size()+" files founded with extention "+pictureExtension);
Iterator<File> fileIterator=fileList.iterator();
//Iterator<File> loopFileIterator=fileList.iterator();
File dest=new File(removedFolderPath);
while(fileIterator.hasNext()) {
File file=fileIterator.next();
System.out.println("process image :"+file.getName());
//each new iteration we retrieve the files staying
Collection<File> list = FileUtils.listFiles(new File(PicturesFolderPath), new String[] { pictureExtension }, false);
for(File f:list) {
if(compareImage(file,f) && !file.getName().equals(f.getName()) ) {
String filename=file.getName();
System.out.println("file :"+file.getName() +" equal to "+f.getName()+" and will be moved on removed folder");
File existFile=new File(removedFolderPath+"/"+file.getName());
if(existFile.exists()) {
existFile.delete();
}
FileUtils.moveFileToDirectory(file, dest, false);
fileIterator.remove();
System.out.println("file :"+filename+" removed");
break;
}
}
}
}
// This API will compare two image file //
// return true if both image files are equal else return false//**
public static boolean compareImage(File fileA, File fileB) {
try {
// take buffer data from botm image files //
BufferedImage biA = ImageIO.read(fileA);
DataBuffer dbA = biA.getData().getDataBuffer();
int sizeA = dbA.getSize();
BufferedImage biB = ImageIO.read(fileB);
DataBuffer dbB = biB.getData().getDataBuffer();
int sizeB = dbB.getSize();
// compare data-buffer objects //
if(sizeA == sizeB) {
for(int i=0; i<sizeA; i++) {
if(dbA.getElem(i) != dbB.getElem(i)) {
return false;
}
}
return true;
}
else {
return false;
}
}
catch (Exception e) {
e.printStackTrace();
return false;
}
}

The already mentioned answer should help you a bit, as considering the width and height of a picture should exclude more candidate pairs quickly.
However, you still have a big problem: For every new file, you read all old files. The number of comparisons grows quadratically and with doing ImageIO.read for every step, it simply must be slow.
You need some fingerprints, which can be compared very fast. You can't use fingerprinting over the whole file content as its infested by the metadata, but you can fingerprint the image data alone.
Just iterate over the image data of a file (like you do), and compute e.g., MD5 hash of it. Store it e.g., as a String in HashSet and you'll get a very fast lookup.
Some untested code
For every image file you want to compare, you compute (using Guava's hashing)
HashCode imageFingerprint(File file) {
Hasher hasher = Hashing.md5().newHasher();
BufferedImage image = ImageIO.read(file);
DataBuffer buffer = image.getData().getDataBuffer();
int size = buffer.getSize();
for(int i=0; i<size; i++) {
hasher.putInt(buffer.getElem(i));
}
return hasher.hash();
}
The computation works with the image data only, just like compareImage in the question, so the metadata get ignored.
Instead of searching for a duplicate in a directory, you compute the fingerprints of all its files and store them in a HashSet<HashCode>. For a new file, you compute its fingerprint and look it up in the set.

PDF file encode to base64 take more time if 100k documents are to be encode

Am trying to encode pdf documents to base64, If it is less in number ( like 2000 documents) its working nicely. But am having 100k plus doucments to be encode.
Its take more time to encode all those files. Is there any better approach to encode large data set.?
Please find my current approach
String filepath=doc.getPath().concat(doc.getFilename());
file = new File(filepath);
if(file.exists() && !file.isDirectory()) {
try {
FileInputStream fileInputStreamReader = new FileInputStream(file);
byte[] bytes = new byte[(int) file.length()];
fileInputStreamReader.read(bytes);
encodedfile = new String(Base64.getEncoder().encodeToString(bytes));
fileInputStreamReader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}

Try this:
Figure out how many files you need to encode.
int files = Files.list(Paths.get(directory)).count();
Split them up into a reasonable amount that a thread can handle in java. I.E) If you have 100k files to encode. Split it into 1000 lists of 1000, something like that.
int currentIndex = 0;
for (File file : filesInDir) {
if (fileMap.get(currentIndex).size() >= cap)
currentIndex++;
fileMap.get(currentIndex).add(file);
}
/** Its going to take a little more effort than this, but its the idea im trying to show you*/
Execute each worker thread one after another if the computers resources are available.
for (Integer key : fileMap.keySet()) {
new WorkerThread(fileMap.get(key)).start();
}
You can check the current resources available with:
public boolean areResourcesAvailable() {
return imNotThatNice();
}
/**
* Gets the resource utility instance
*
* #return the current instance of the resource utility
*/
private static OperatingSystemMXBean getInstance() {
if (ResourceUtil.instance == null) {
ResourceUtil.instance = ManagementFactory.getOperatingSystemMXBean();
}
return ResourceUtil.instance;
}

What exactly does AudioInputStream.read method return?

I have some problems finding out, what I actually read with the AudioInputStream. The program below just prints the byte-array I get but I actually don't even know, if the bytes are actually the samples, so the byte-array is the audio wave.
File fileIn;
AudioInputStream audio_in;
byte[] audioBytes;
int numBytesRead;
int numFramesRead;
int numBytes;
int totalFramesRead;
int bytesPerFrame;
try {
audio_in = AudioSystem.getAudioInputStream(fileIn);
bytesPerFrame = audio_in.getFormat().getFrameSize();
if (bytesPerFrame == AudioSystem.NOT_SPECIFIED) {
bytesPerFrame = 1;
}
numBytes = 1024 * bytesPerFrame;
audioBytes = new byte[numBytes];
try {
numBytesRead = 0;
numFramesRead = 0;
} catch (Exception ex) {
System.out.println("Something went completely wrong");
}
} catch (Exception e) {
System.out.println("Something went completely wrong");
}
and in some other part, I read some bytes with this:
try {
if ((numBytesRead = audio_in.read(audioBytes)) != -1) {
numFramesRead = numBytesRead / bytesPerFrame;
totalFramesRead += numFramesRead;
}
} catch (Exception e) {
System.out.println("Had problems reading new content");
}
So first of all, this code is not from me. This is my first time, reading audio-files so I got some help from the inter-webs. (Found the link:
Java - reading, manipulating and writing WAV files
stackoverflow, who would have known.
The question is, what are the bytes in audioBytes representing? Since the source is a 44kHz, stereo, there have to be 2 waves hiding in there somewhere, am I right? so how do I filter the important informations out of these bytes?
// EDIT
So what I added is this function:
public short[] Get_Sample() {
if(samplesRead == 1024) {
Read_Buffer();
samplesRead = 4;
} else {
samplesRead = samplesRead + 4;
}
short sample[] = new short[2];
sample[0] = (short)(audioBytes[samplesRead-4] + 256*audioBytes[samplesRead-3]);
sample[1] = (short)(audioBytes[samplesRead-2] + 256*audioBytes[samplesRead-1]);
return sample;
}
where Read_Buffer() reads the next 1024 (or less) Bytes and loads them into audioBytes. sample[0] is used for the left side, sample[1] for the right side. But I'm still not sure since the waves i get from this look quite "noisy". (Edit: the used WAV actually used little-endian byte order so I had to change the calculation.)

AudioInputStream read() method returns the raw audio data. You don't know what is the 'construction' of data before you read the audio format with getFormat() which returns AudioFormat. From AudioFormat you can getChannels() and getSampleSizeInBits() and more... This is because the AudioInputStream is made for known format.
If you calculate a sample value you have different possibilities with signes and
endianness of the data (in case of 16-bit sample). To make a more generic code
use your AudioFormat object returned from AudioInputStream to get more info
about the data buffer:
encoding() : PCM_SIGNED, PCM_UNSIGNED ...
bigEndian() : true or false
As you already discovered the incorrect sample building may lead to some disturbed sound. If you work with various files it may case a problems in the future. If you won't provide a support for some formats just check what says AudioFormat and throw exception (e.g. javax.sound.sampled.UnsupportedAudioFileException). It will save your time.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java compare two audio files with fingerprint - java

Related

Error of FFmpeg on Java in "av_image_copy_to_buffer" method during decoding H.264 stream

huffman code - cant decompress BMP files using bitset

Image comparaison performance java

PDF file encode to base64 take more time if 100k documents are to be encode

What exactly does AudioInputStream.read method return?

Categories

Resources