Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I have quite a weird question, but here it is:
Is it possible, and are there any guides for writing a custom video codec in C++ or Java?
Here's the weird part: I don't need to dive into those tons of info about audio and motion picture I don't understand. What I actually need is the technical stuff behind how to make a software layer between a movie player and a movie file.
Here's why: I would like to create a library or ultimately 2 functions - encode / decode - in C++ / Java, which will take the RAW binary input of any type of file and encode / decode it according to a given password or something like that. Then I need to put this processing between a movie player and a movie file. The final result will be a password protected mp4 / avi / mpeg / wmv (doesn't really matter) file, that could be played only with this "codec". The internal logic of the codec is not the issue right now.
How I imagine it is like a stream, movie player request the file and calls my encode() function, it takes a chunk of the file, decodes it (it has been previously encoded) and returns the correct bytes in wmv/mp4 and so on format.
Is any of this possible and how?
A codec generally takes image blocks and context information, transforms and quantizes the data, applies predictions, then encodes the resulting error stream using one of any number of coding schemes.
The API is usually simple. For encode, you send blocks of image data (frames) to the encoder, and it generates a stream of bits. You may be responsible for writing the container (file format) yourself. For decode, you stream bits in and frames come out.
There is absolutely no standard to any of this -- the technologies used in the codecs are sometimes standardised, but the exact interfaces are not.
MediaTool Introduction is a simple Application Programming Interface (API) for decoding, encoding and modifying video in Java:
http://wiki.xuggle.com/MediaTool_Introduction#How%5FTo%5FTake%5FSnapshots%5FOf%5FYour%5FDesktop
Java Media frame work tutorial:
http://wwwinfo.deis.unical.it/fortino/teaching/gdmi0708/materiale/jmf2_0-guide.pdf
maybe helps you!
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have hundreds of images of handwritten notes. They were written from different people but they are in sequence so you know that for example person1 wrote img1.jpg -> img100.jpg. The style of handwriting varies a lot from person to person but there are parts of the notes which are always fixed, I imagine that could help an algorithm (it helps me!).
I tried tesseract and it failed pretty bad at recognizing the text. I'm thinking since each person has like 100 images is there an algorithm I can train by feeding it a small number of examples, like 5 or less and it can learn from that? Or would it not be enough data? From searching around it seems looks like I need to implement a CNN (e.g. this paper).
My knowledge of ai is limited though, is this something that I could still do using a library and some studying? If so, what should I do going forward?
This is called OCR and there has been a progress. Actually, here is an example of how simple it is to parse an image file to text using tesseract:
try:
from PIL import Image
except ImportError:
import Image
import pytesseract
def ocr_core(file):
text = pytesseract.image_to_string(file)
return text
print(ocr_core('sample.png'))
BUT
I am not very sure that it can recognize different types of handwriting. You can give it a try yourself to find out. If you want to try the python example you need to import tesseract but first things first to install tesseract on your OS and add it to your PATH.
There are many OCRs out there and some perform better than others. However, this is a field that has improved a lot recently with the Deep Neural Networks. I would consider using a Cloud provider such as Azure, Google Cloud or Amazon. Your upload the image and they return the metadata.
For instance:
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
If you don't want to use cloud services for any reason, I would consider using TensorFlow... but some knowledge is required:
Tensorflow model for OCR
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm creating an Android app which has to identify draws made by children. The structure of the draw is this:
I noticed that the Google Cloud Vision AI instrument can identify that the draw corresponds to an animal and this is exactly what I'm looking for. Since a child can draw and write what he wants (this means that texts and numbers are accepted) I understood that he could access to the paid features of the service. Are there instrument like this one which identify objects/human being and that can be implemented in Android?
The Google Cloud Vision API allows one to submit an image and get back an interpretation of that image. The data returned can contain a variety of sections. At the highest level these are:
Labels
Text
Document (OCR)
Safe determination
Face detection
Landmark detection
Logo detection
Image properties
Web similar images
Cropping suggestions
When you supply an image, you have the choice of how many of these features are examined from the supplied image. When you make the API call to process the image, you declare which (one, some or all) of the above are to be processed. From your description, it sounds like you are looking for label detection and nothing more.
Since Google has to run significant compute and other data processing work against the image, there is a charge for the service of interpreting an image. Looking at the pricing page we seem to see that Google charges in units of 1000 images. There is a free tier which is 1000 images/month. If you need to process less than this, then there should be no cost. If you need to process more than 1000/month, it appears that the charges vary based upon the processing required. For example, it appears to be $1.50 for every 1000/month above the first 1000/month that are free. The price decreases automatically if you have very high volumes of images to process. If you ONLY need label detection then when you submit an image for processing, ensure that is all that is requested in the API call. If you request additional interpretations of the image you will be billed for those in addition.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am beginner in Android Programmming. Developing a sound detection app in Android studio. Can somebody help me to detect a specific sound ?
Try using musicg
musicg is a lightweight audio analysis library, written in Java, with the purpose of extracting both high level and low level audio features. This API allows developers to extract audio features and operate audio data like reading, cutting and trimming easily from an inputstream. It also provides tools for digital signal processing, renders the wavform or spectrogram for research and development purpose.
Add musicg library to your project and try this code :-
Wave w1= new Wave("first_wav"); // Base Audio file
Wave w2= new Wave("second_wav"); // Audio file to compare
// Finding Audio Fingerprint Similarity
FingerprintSimilarity fps = w1.getFingerprintSimilarity(w2);
float score = fps.getScore();
float sim = fps.getSimilarity();
sim contains the similarity between to audio files (value rages from 0 to 1.0). value greater that 0.3 can be considered as similar sound.musicg uses 16 bit PCM audio files.
but getFingerprintSimilarity() accept only wave format files.
This app uses musicg for sound detection.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I need to create a "morse code decoder" for Android, very similar to this app: https://play.google.com/store/apps/details?id=org.jfedor.morsecode
My app must listen a sounds (morse code) from the microphone. And translate the code in original text.
To be honest, this feature is part of a larger project. My intent is create a system:
ENCODE: a Java Application that translate a text in sound (in this case I have chosen the morse code... we don't have much time for create a our "alphabet"...). So, it is text-to-sound.
DECODE: an Android App for "listen" this sound (the morse code) and obtain the original text. So, sound-to-text.
Creating the java application isn't problem, but it is for the android app... to listen the sound is ok, but TO UNDERSTAND IT is the issue.
Just break the problem down into the parts. There's:
1) recording from the microphone [ok, no problem]
2) detecting the start times of the tones
3) building up this into a sequence of dots and dashes.
4) translating this into text
I would start from step 2)... thought to act like this: I set the app to listen to the sound at a certain frequency and speed. Must recognize morse code... translate it and print the original text for the user... but how? I do not know where to start. Any ideas?
Just break the problem down into the parts. There's :
1) recording from the microphone
2) detecting the start times of the tones
3) building up this into a sequence of dots and dashes.
4) translating this into text
None of those seems particularly difficult on its own. 2) and 3) are probably hardest, especially if the speed of the signal varies a lot or if you need to handle errors. So perhaps you could start there with some pre-recorded audio files.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking to do some image processing in Java and I'll be developing in Ubuntu with Eclipse.
So here is my objective:
From a greyscale image, I would like to be able to detect certain sized objects and draw a rectangular frame around them. However, the catch is that this image is captured from a thermal imaging camera so to detect body heat the pixels will have a value within a certain range.
After detecting all the objects in the image, I will need to count them, but that's later.
So here's my challenge. Which tools/apis/open classes can I use to do something like this. I looked around and found some basic manipulations such as rotate, crop, resize. But haven't really found anything I can use.
Where should I look/start?
thanks a lot in advance
ImageJ is very useful:
http://rsbweb.nih.gov/ij/
Although ImageJ is set up as a GUI, you can use it as a library too (I do that too)
You'll have to search for a proper object detection plugin (but there are some floating around...)
good luck!
Eelco
On this page you can find open-source tool for image processing and image mining:
http://spl.utko.feec.vutbr.cz/en/image-processing-extension-for-rapidminer-5
This article fully explains the algorithm you're looking for, and the accompanying source code is here. You can see it in action in this video.
(Disclaimer: I'm the author; but I do think this is very useful, and have successfully used the algorithm a lot myself.)
The algorithm tracks moving objects, finds their bounding rectangle (which the application draws), counts the number of pixels in each objects, correlates them throughout frames as the same object (with an int ID). You may need to do a trival conversion of your grayscale image to RGB (by copying the gray values to all three channels) since the algorithm was designed for color input.
When it comes to commercial computer vision applications, OpenCV and the Point Cloud Library aka PCL are your best friends. And articles like the one linked explains how to use tools like OpenCV to accomplish full stack motion tracking. (The pure Java implementation shows how it works down to the individual pixels.)