I've recently begun trying to create a mobile app (iOS/Android) that will automatically beat match (http://en.wikipedia.org/wiki/Beatmatching) two songs.
I know that this exists out there, and there have been others who have had some success, but I'm running into issues related to the accuracy of the players.
Specifically, I run into "sync" issues where the "beats" don't line up. The various methods used to date are:
Calculate the BPM in advance, identify a "beat" (using something like sonicapi.com), and trying to line up appropriately, and begin a mix in with its playback rate adjusted (tempo adjustment)
Utilizing a bunch of meta data to trigger specific starts and stops
What does NOT work:
Leveraging echonest's API (it beat matches on the server, we want to do it on the client)
Something like pydub (does not do it in realtime)
Who uses this algorithm today:
iwebdj
Traktor
Does anyone have any suggestions on how to solve this problem? I've seen lots of people do it, but doing it in real time on a mobile device seems to be an issue.
There are lots of methods for solving this problem, some of which work better than others. Matthew Davies has published several papers on the matter, among many others. Glancing at this article seems to break down some of the steps necessary for doing this. I built a beat tracker in Matlab (unfortunately...) with a fellow student and our goal was to create an outro/intro between 2 songs so that the tempo was seamless between them. We wanted to do this for songs that varied in BPM by a small amount (+-7 or so BPM between the two). Our method went sort of like this:
Find two songs in our database that had overlapping 'key center'. So lets say 2 songs, both in Am.
Find this particular overlap of key centers between the two. Say 30 seconds into song 1 and 60 seconds into song 2
Now create a beat map, using an onset-detection algorithm with peak picking; Also, this was helpful for us.
Pick the first 'beat' for each track, and overlap the two tracks at that point. Now, since they are slightly different BPM from each other, the beats won't really line up with each other.
From this, we created a sort of map that gave us the sample offsets between beats of song A and beats of song B. From this, we wanted to be able to time-stretch the fade-in region of song B so that each one of its onsets (beats in this case) lined up at the correct sample index as the onsets from song A, over ITS fade-out region. So for example, if onset 2 from song B was shown as 5,000 samples ahead of onset 2 from song A, we simply stretched that 5,000 sample region so that onset 2 matched exactly between both songs.
This seems like it would sound weird, but it actually sounded pretty good. Although this was done entirely offline in Matlab, I am also looking for a way to do this in real-time in a mobile app. Not entirely sure about libraries you can use for this in Android world, but I imagine that it would be most efficient in C++.
A couple of libraries I have come across would be good for prototyping something, or at least studying the source code to get a better understanding of how you could do this in a mobile app:
Essentia (great community, open-source)
Aubio (also seems to be maintained pretty well, open-source)
Additional things to read up on for doing this kind of stuff in iOS land:
vDSP Programming guide
This article may also help
I came across this project that is doing some beat detection. Although it seems pretty out-dated unfortunately, it may offer some additional insights.
Unfortunately it isn't as simple as just 'pressing play' at the same time to align beats, unless you are assuming very specific aspects about them (exact tempos, etc.).
If you reallllly have some time on your hands, you should check out Tristan Jehan's (founder of Echonest) thesis; it is jam packed with algorithms and methods for beat detection, etc.
Related
I am currently doing my dissertation which would involve in having 2 people a professional athlete and an amateur. First with the image processing skeletonization I would like to record the professional athlete while performing the squat exercise , then when the amateur performs the exercise I want to be able to compare the professional skeleton with that of the amateur to see if it is properly formed.
Please I m open for any suggestions and opinions , Would gladly appreciate some help
Here lies your question:
properly formed.
What does properly performed actually mean ? How can this be quantified ?
Bare in mind I'm not an athletic/experienced in this field.
If I were given the task I would counter-intuitively go in the opposite direction:
moving away form Processing 3/kinect/computer. I would instead:
find a professional athlete
find a skilled with trainer with functional mobility training.
find an amateur (probably easiest)
Item 2 will be trickier.For example FMS seems to put a lot of emphasis on correct exercising and mobility (to enhance performance and reduce risk of injuries). I'm not sure if that's the only approach or the best. You might want to check opinions on Physical Fitness, consult with people studying/teaching exercise science, etc. Do check credentials as it feels like a field where everyone has an opinion/preference.
The idea is to understand how a professional educated trainer asses correct movement. Take note of how that works in the real world and try to systemise it.
What are the cues for a correct execution ?
is the key poses
the motion in between
how the skeletal and muscular system work together/ the weights/forces applied/etc.
Having a better understanding of how this works in the real world should lead you to things you can start quantifying/comparing numerically on a computer.
Try to make a checklist/score system manually using a pen and paper based on the information you gather. If this works you already have a system you can start programming.
The next step is acquiring the data.
This is probably where the kinect comes, but bare in mind:
the second version of the kinect is more precise than the first
there is a Kinect2 SDK wrapper for Processing 3: use that if you can (windows only). There is a way you can get libfreenect2 working with OpenNI on osx/linux and therefore with SimpleOpenNI in Processing, but it's not straight forward and you won't have the same precision on the skeleton tracking algorithm
use data that is as precise as possible:
you can get the accuracy of a tracked skeleton joint
use an environment that doesn't contain a complex background (makes it easy to segment users and detect/track skeletons with little change of mistaking it for something else). prefer artificial non-incandescent light (less of a problem with kinect v2, but still you want as little IR interference as possible).
comparing orientation matrices or joints on single poses might not be enough to get the full picture: how do you capture/quantify motion taking into account the things that the kinect can't easily see: muscles flexing/forces applied/moving centre of gravity/etc.
try to use a grid system that will make it simple to pair the digital values with real world measurements. Check out how people used to study motion in the past, for example Étienne-Jules Marey or Eadweard Muybridge
Motion capture by Étienne-Jules Marey
Motion study by Eadweard Muybridge (notice the grid)
It's a pretty full on project to get right involving bits of anatomy/physics/kinematics/etc.
Start with the research first:
how did people study this in the past ?
what are the current developments ?
how does it work in the real world (without computers) ?
Take your constraints into account:
what resources (people/gear/etc.) can you use ?
how much time do you have available ?
Given the above, what topic/section of the project can be realistically be tackled to get useful results.
Overall probably something along these lines:
background research
real world studies
comparison system has feature which can be measured both with kinect and by a person
record data (real world data + mobility comparison evalutation and kinect data + mobility comparison)
compare data
write evaluation of findings (how effective is the system? what are limitations ? what could be improved (future work) ? etc.)
In short be aware of the kinect limitations: skeleton tracking is probability based: it's not 100% accurate. use data that's as clean/correct as possible to begin with (make it easy to acquire good data if you can control the capture environment). From what a real trainer would track, what could you track with a kinect ? do a comparison of the intersecting measurements.
I would like to do some audio and video analysis in Java.
In a bit more detail, I would like to identify the points in audio/video that have either been monotonous for quite some time or have drastically changed compared to some previous state.
If you want to look at it in a mathematical way, I can try to explain it like this:
Example:
You have an audio file. You should extract the waveform of that
audio file. You could try to approximate that waveform with some
simpler function, that can be expressed as a closed formula. Let's
call that function f(t).
Now, to find out how your function behaves (is it increasing or decreasing) at some point or interval, I guess I could use the first derivative,f'(t). If I'd like even more information, I assume second derivative, f''(t) would also come in handy.
So, if we assume we can do that then I guess I'd have 1 piece of information about the audio.
However, if I'm not mistaken, audio files can also have spectrograms, so I'm unsure how they fall into all of this.
So, the real question goes here: Is there a way to do this in Java (efficiently)? I've been doing some digging and I've found MusicG, however, the last update date is July 2012, which leads me to believe this may be abandoned.
The second part refers to video files, but without their audio component.
This is where I'll have more questions, so I'm just gonna go and shoot them.
How do you identify points of change in "pace" in videos?
Here's an example:
Imagine the video shows car driver's point of view while he's driving
on a long, straight road. Since the surroundings are mostly the same,
the pace could be described as "not changing much". At one point, the
road begins to curve but the driver, due to him falling asleep" is not
following the road that precisely, so the surroundings start to change
somewhat, and so does the pace. At the apex of that curve there is a
tree, which grows bigger and bigger as the car is approaching it.
Here, the POV (and the pace) is changing quite a lot, since the tree
is getting bigger and bigger. In the end, the car crashes into a tree,
all hell breaks loose, the car starts to roll uncontrollably, which
indicates a really intense pace.
I'm assuming one way could be to do an image segmentation and somehow determine which portions of the frames are changing, and how big are those portions to try to determine pace, but I'd like additional input.
If anyone has had prior experience doing any sort of related work in Java, what approaches did you explore and/or use? One thing that immediately comes to my mind is JavaCV, but as I said, with my limited experience, I'm unsure what to actually try.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?
This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.
The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html
Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins
The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)
If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial
Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.
To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.
Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).
There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.
I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.
For the last week I've been researching and experimenting with facial recognition. The intended application is for a person to be able to look up a person's information in a database (SQL) by simply taking a picture of their face. The initial expectation was to be able to compress a face down to a key or hash and use this as the database lokup. This need not be extremely accurate as the person looking up the information can and most likely will end up doing a final comparison between the original image on file and the person standing in front of them.
OpenCV/JavaCV seems to be the obvious starting point, and the facial detection that it provides works well, however the implementation of Eigenfaces for facial recognition isn't ideal because online training by recompiling hundreds of thousands of user faces every time a new face needs to be added to the training set wouldn't work.
I am experimenting with using SURF descriptors on a face extracted using OpenCV's Haar Cascade features, and this appears to get me closer to the intended result, however I am unable to think of a way to efficiently lookup and compare roughly 30 descriptors (which are either 64 or 128 dimensional vectors) in a database. I've done some reading about LSH and Spectral Hashing algorithms, however there are no implementations to be found for Java and my math isn't strong enough to implement them myself.
Does anyone have any thoughts or ideas on how this might be accomplished, or if it is even possible?
Hashing isn't complicated, nor do you need a degree in maths.
Assuming that any 2 images will result in a fairly similar number of 'descriptors' then it only requires that you get a reasonable match with enough of them to get to a high enough confidence factor.
How specific these descriptors are determines what level of collision you can accept in your hashing algorithm.
As you have several of them, I would suggest that you don't need anything too sophisticated - after all, you probably want a level of 'fuzziness' in your search?
Start with something simple - experiment and refine. You might even find that you'll need different hashing for different descriptors - i.e. some might be more specific than others?
Hopefully some food for thought.
I'm coming to the end of my first year of CS and I thought a great way to consolidate all the things I've learnt this year would be a personal game project.
I would like to implement a 2D based rts, I'm thinking along the lines of starcraft I, warcraft II or even command and conquer. I will have about 3 months without interruptions to implement the game.
So to anyone experienced with java game programming, I have a few questions:
Is it realistic to design a 2D rts engine from scratch in 3 months?
If so what are some good books/resources to get started?
Would it be better to modify some existing project? I would think the experience of having to work with a lot of someone else's code would be good since our exposure to such topics in an undergrad cs degree seems very rare, if non-existent.
Are there any decent open source 2d rts projects that anyone could recommend? I've looked through a few but most seem to be written in c/c++
My humble thanks
Edit: Thanks for the quick responses, I think that perhaps it was a bad idea to post this in a rush since I think I misrepresented what I want to do.
When I say "along the lines of warcraft II etc" I mean more like that style of rts using sprites. I don't intend to implement a game nearly that complex, more like just a basic prototype.
My goal would be some thing more like a flat textured map with some basic obstacles like trees, a single unit producing structure like a barracks. I'd like to have the units to have health bars, be able to move and attack and die (and possible morph into another unit).
Far off goals would be to implement some basic pathing using a modified version of the dijkstra shortest path algorithm, ranged units with missle attack, etc.
I don't plan to implement any opponents or ai or networking or anything like that.
I'm thinking along the lines of starcraft I, warcraft II or even command and conquer
Make sure you purge your mind of matching the full scope of any of those. They took large teams of developers multiple years to make, with multi-million dollar budgets, so you can't even hope to approach those. They're called AAA for a reason. That being said, there's no reason you can't very minimally ape their design, or make a tiny game in their genre, assuming you have previous experience making small games.
A sub-genre of RTS that might be doable in that amount of time is a Tower Defense game. Plants vs Zombies is a good example. The reason I suggest this sub-genre is that you can avoid implementing any sort of AI or path-finding, which are notoriously difficult to get working, and I think technically impossible to implement "perfectly", especially with a limited CPU budget.
Make sure to reign in your scope. Favor a "complete" game over new features, because you can then call it "done" at any time. Get your game playable ASAP, and don't sweat the polish or details until you have to. Add one enemy type and one type of player unit (with only one ability, if you were thinking of implementing multiple abilities per unit). Make a title screen, menus (even if the menu is just "click screen to play"), game over screen, level complete or stat screens, cross-level player statistics, etc. Once you have all that ironed out, spend equal time adding new features and polishing the gameplay/graphics/bugs.
Once you have a playable, "complete" game ready (no matter how small in scope), find a real artist to do graphics for you. A shiny game always draws an audience, no matter how simple the gameplay.
It is very unrealistic to think you could implement a 2D RTS engine anywhere even close to the complexity in those kind of games. You could maybe get something very rough if you were experienced, but with only one year I think it's doubtful.
I can't help but feel like it would be much better for you if you used an existing engine or framework and built off of it. Like you said, working with other code would probably be a good learning experience as well. It would allow you to experiment without getting bogged down in having to do everything.
Keep it simple or you will simply drown in complexity before getting around to have anything playable. Since you have not tried it before, you will have a lot of nuts to crack and you don't know how long they will take.
Also remember that report writing and documentation takes time too.
The idea is good, and I think you can pull off a whole game if you find good building blocks. I would suggest discussing this with your teacher to hear what is acceptable for you to use. Would it e.g. be ok to do a game on an open source engine if you add some non-trivial functionality?
Update: Seems to be several engines available from Java at http://www.devmaster.net/engines/list.php?fid=6&sid=1
People often forget, that creating games is MUCH MORE than just coding the technique thing. Its about content creation, game design, sound and music, the "fun factor". If you make heavy use of existent APIs or engines, it will be possible, but writing it from scratch with no experience in 3 month is like asking yourself if you can code 100,000 LOC in this time which means 1111 LOC per day. This might be possible, but not if you have to desing and think, and just having the code makes no game.
Perhaps it would make sense to look at some existing efforts to get a feel for the scope of what you are looking at. These should give you some ideas or even code to build on:
http://www.duncanjauncey.com/btinternet/old/javagame/game.html
http://en.wikipedia.org/wiki/Lightweight_Java_Game_Library
http://www.ardor3d.com/
http://en.wikipedia.org/wiki/JMonkeyEngine
It would be a lot for me to bite off (from scratch) in the time given that is for sure. That is about all I can say.
EDIT: I thought maybe JOGRE was not what you are looking for. Then I thought about it and it seems like it would have all the right kinds of plumbing for what you are trying to do.
EDIT AGAIN: After my answer, one of the related questions links on the side seemed relevant: Java Game Programming: JOGL vs LWJGL?
Well if it gives you any hope at all, my team and I are currently working on an RTS game called "The Genesis Project". We call ourselves MotherBoard Games, or MBG for short. If you would like, I am always looking for more coders. You can email me at mpmn5891#gmail.com, I can give you some advice and tips form my 6 year experience, 2 of which have been spent making this game (to give you a scope)