Finding the peaks of a spectrogram - java

I am currently working on a project for my 2nd year. I am supposed to code in java a tuner. I have chosen to do a guitar tuner.
After looking around on the internet, I found a java code to do a FFT. I changed it a bit, understood it and have tested it. I know it works fine (i made a graph of it and looked at the different peaks using simple sines functions).
I am now trying to find the fundamental frequency. From what I understand, this frequency is given by the first peak.
I would thus like to create a method that finds for instance the first 5 peaks of my FFT and gives them to me with their indexes.
I first did a simple method where I compared two by two each point of my spectrogram and when the sign changed that's where I knew there was a peak. This method works great with ideal signals (without any noise). However it becomes completely useless if I add noise.
I am really bad in java (I actually started with this project and basically the simple function I described above is my master piece.... just so you get an idea of my level).
Can anyone help me? I would really appreciate it! :)
Thanks in advance!
Have a great day!
fireangel

I'd say your best bet is going to be to read in all the values as an array, then run over them and 'smooth' them using a rolling average of some kind.
Afterwards, you'll have a much smoother curve. Find your peaks using this curve, then go back to your original data and use the peak indexes to find the actual peak there.
pseudocode:
// Your raw data
int[] data = getData();
// This is an array to hold your 'smoothed' data
int[] newData = new int[data.length];
// Iterate over your data, smooth it, and read it into your smoothed array
for (i < data.length) {
newData[i] = (data[i-2] + data[i-1] + data[i] + data[i+1] + data[i+2]) / 5;
}
// Use your existing peak finding function on your smoothed data, and get
// another array of the indexes your peaks occur.
int[] peakIndexes = yourPeakFindingFunction(newData);
// Create an array to hold your final values.
int[] peakValues = new int[peakIndexes.length];
// Iterate over your peak indexes and get the original data's value at that location.
for(i < peakIndexes.length) {
peadValues[i] = data[peakIndexes[i]];
}
Very basic and very brute-force, but it should get you on the right track for an assignment.
You'll need to play with the algorithms for smoothing the data so it's representative and for finding the actual peak at the location indicated by the smoothed data (as it won't be exact).

Related

Efficient Intersection and Union of Lists of Strings

I need to efficiently find the ratio of (intersection size / union size) for pairs of Lists of strings. The lists are small (mostly about 3 to 10 items), but I have a huge number of them (~300K) and have to do this on every pair, so I need this actual computation to be as efficient as possible. The strings themselves are short unicode strings -- averaging around 5-10 unicode characters.
The accepted answer here Efficiently compute Intersection of two Sets in Java? looked extremely helpful but (likely because my sets are small (?)) I haven't gotten much improvement by using the approach suggested in the accepted answer.
Here's what I have so far:
protected double uuEdgeWeight(UVertex u1, UVertex u2) {
Set<String> u1Tokens = new HashSet<String>(u1.getTokenlist());
List<String> u2Tokens = u2.getTokenlist();
int intersection = 0;
int union = u1Tokens.size();
for (String s:u2Tokens) {
if (u1Tokens.contains(s)) {
intersection++;
} else {
union++;
}
}
return ((double) intersection / union);
My question is, is there anything I can do to improve this, given that I'm working with Strings which may be more time consuming to check equality than other data types.
I think because I'm comparing multiple u2's against the same u1, I could get some improvement by doing the cloning of u2 into a HashSet outside of the loop (which isn't shown -- meaning I'd pass in the HashSet instead of the object from which I could pull the list and then clone into a set)
Anything else I can do to squeak out even a small improvement here?
Thanks in advance!
Update
I've updated the numeric specifics of my problem above. Also, due to the nature of the data, most (90%?) of the intersections are going to be empty. My initial attempt at this used the clone the set and then retainAll the items in the other set approach to find the intersection, and then shortcuts out before doing the clone and addAll to find the union. That was about as efficient as the code posted above, presumably because of the trade of between it being a slower algorithm overall versus being able to shortcut out a lot of the time. So, I'm thinking about ways to take advantage of the infrequency of overlapping sets, and would appreciate any suggestions in that regard.
Thanks in advance!
You would get a large improvement by moving the HashSet outside of the loop.
If the HashSet really has only got a few entries in it then you are probably actually just as fast to use an Array - since traversing an array is much simpler/faster. I'm not sure where the threshold would lie but I'd measure both - and be sure that you do the measurements correctly. (i.e. warm up loops before timed loops, etc).
One thing to try might be using a sorted array for the things to compare against. Scan until you go past current and you can immediately abort the search. That will improve processor branch prediction and reduce the number of comparisons a bit.
If you want to optimize for this function (not sure if it actually works in your context) you could assign each unique String an Int value, when the String is added to the UVertex set that Int as a bit in a BitSet.
This function should then become a set.or(otherset) and a set.and(otherset). Depending on the number of unique Strings that could be efficient.

Finding neighbors to Points in an ArrayList

I've recently started learning Java and though doing a "Conway's Game of Life" style program would be a good thing to start out with. Everything works fine but I'm having some serious performance issues with this part:
static List<Point> coordList = new ArrayList<Point>();
public int neighbors(int x, int y){
int n = 0;
Point[] tempArray = { new Point(x-1, y-1), new Point(x, y-1), new Point(x+1, y-1),
new Point(x-1, y ), new Point(x+1, y ),
new Point(x-1, y+1), new Point(x, y+1), new Point(x+1, y+1)};
for (Point p : tempArray) {
if (coordList.contains(p))
n++;
}
return n;
}
The method is used when iterating the ArrayList coordList filled with Points and checking every element how many neighbors they have. When the list size gets to about 10000 Points every cycle takes about 1 seconds and for 20000 Points it takes 7 seconds.
My question is, what would be a more effective way to do this? I know there are several other programs of this kind with source code available too look at, but I wan't do do as much as I can by my self since the point of the project is me learning Java. Also, I don't want to use a regular array because of the limitations.
If your points are unique, you could store them in a HashSet instead of an ArrayList. The contains method will then become an O(1) operation vs. O(n) in your current setup. That should speed up that section significantly.
Apart from the declaration, your code should remain mostly unchanged as both implement the Collection interface, unless you call List-specific method such as get(i) for example.
Performance-wise, I think your best bet is to have a plain numeric (effectively Boolean) array representing the grid. Since this is a learning exercise, I'd start with a simple one-element-per-cell array, and then perhaps progress to packing eight adjacent cells into a single byte.
It is not entirely clear what you mean by "the limitations".
The following has some interesting pointers: Optimizing Conway's 'Game of Life'
Your current code scales in a quadratic manner O(n^2). You have only given part of the program. If you look at your whole program there will be a loop that calls neighbors() and you will see that neighbors() is called n times. Also the operation contains() is linear in n, so the time is proportional to their product n*n.
Quadratic scaling is a common problem but can often be reduced to linear by using indexed data structures such as HashSet.

Collision Detection with MANY objects

I mainly focused on the Graphics aspects to create a little 2DGame. I've watched/looked at several tutorials but none of them were that pleasing. I already have a player(a square) moving and colliding with other squares on the screen. Gravity etc. Are also done.
If there are only that much objects as seen on the screen (30*20), everything works perfectly fine. But if I increase it to let's say 300*300 the program starts to run very slow since it has to check for so many objects.
I really don't get how games like Minecraft can work with ALL THOSE blocks and my program already gives up on 300*300 blocks.
I already tried to ONLY check for collisions when the objects are visible, but that leads to the program checking every single object for it's visibility leading to the same problem.
What am I doing wrong? Help appreciated.
I'll post some code on how I handle the collisions.
player.collision(player, wall);
public void collision(Tile object1, Tile[] object2){
collisionCheckUp(object1, object2);
collisionCheckDown(object1, object2);
collisionCheckLeft(object1, object2);
collisionCheckRight(object1, object2);
}
public void collisionCheckDown(Tile object1, Tile[] object2){
for (int i = 0; i < Map.tileAmount; i++){
if(object2[i] != null && object2[i].visible)
{
if(object1.isCollidingDown(object2[i])){
object1.collisionDown = true;
return;
}
}
}
object1.collisionDown = false;
}
public void compileHullDown(){
collisionHull = new Rectangle((int)x+3, (int)y+3, width-6, height);
}
int wallCount = 0;
for (int x=0;x<Map.WIDTH;x++) {
for (int y=0;y<Map.HEIGHT;y++) {
if (Map.data[x][y] == Map.BLOCKED) {
wall[wallCount] = new Tile(x * Map.TILE_SIZE, y * Map.TILE_SIZE);
wallCount++;
}
}
}
The usual approach to optimize collision detection is to use a space partition to classify/manage your objects.
The general idea of the approach is that you build a tree representing the space and put your objects into that tree, according to their positions. When you calculate the collisions, you traverse the tree. This way, you will have to perform significantly less calculations than using the brute force approach, because you will be ignoring all objects in branches other than the one you're traversing. Minecraft and similar probably use octrees for collision (and maybe for rendering too).
The most common space partition structures are BSP-Trees, kd-Trees (a special type of BSP-trees). The simpler approach would be to use a uniform space partition for the start - split your space in axis-aligned halves.
The best resource on collision that I have discovered is this book. It should clarify all your questions on the topic.
That's if you wanted to do it right. If you want to do it quick, you could just sample the color buffer around your character, or only in the movement direction to determine if an obstacle is close.
As Kostja mentioned, it will be useful for you to partition your space. However, you will need to use QuadTrees instead of Octrees as you are only in 2D not 3D.
Here are is a nice article to get you started on QuadTrees.
You can cut your overhead by a factor of 4 by, instead of calculating collisions for up/down/left/right, calculating collisions once and using the relative positions of the two objects to find out if you hit a floor, wall, or ceiling. Another good idea is to only pay attention to the objects that are nearby - maybe once every 0.25 seconds make a list of all objects that are probably close enough to collide with in the next 0.25 seconds?

Speeding up code - 3D array

I'm trying to improve the speed of some code I've written. I was wondering how efficient accessing data from a 3d array of integers is?
I have an array
int cube[][][] = new int[10][10][10];
which I populate with values. I then access these values several thousand times.
I was wondering, seeing as all 3d arrays are theoretically stored in 1D arrays in memory, is there a way to turn my 3d array into a 1d one? For instance I could have cube[0] referring to the old cube[0][0][0] and cube [1] refering to the old cube[0][0][1].
I'm not sure how to go about doing it. I'm sure it's possible but my brain is worn out.
Thanks
You can create the single-dimension array as follows:
int cube[] = new int[w * h * d];
And to access an element:
int value = cube[x * h * d + y * d + z];
But I doubt it will be much faster and you're losing some convenience and safety. Before deciding to go through with this change it might be a good idea to perform some benchmark tests on your data to see if you actually have a problem and whether the change gives a sufficiently large improvement to be worth the extra complexity.
That's exactly what Java is doing behind the scenes. A three dimensional array is simply an array of arrays of arrays. In theory you could separate the arrays into 10 two dimensional arrays or 100 one-dimensional arrays (and even into 1000 individual variables), but it would be unlikely to speed up your performance. Focus on optimizing your algorithm instead.
int cube[] = new int[ X*Y*Z ];
cube[ i*X*Y + j*X + k ] = ...
But, as others already said: It's not expected to be faster (as the calculations have to be done anyway). Let Java do its stuff for reasons of error-avoidance.
Do not do it - Java handles all this for you. You can of course make it a 1D array and then do the calculations but you will hardly beat the optimized JVM code which does the same on the background. Also - is this really causing a performance bottleneck according to a profiler? If not, you might optimize your code prematurely.
You could use a LinkedList and store a 2D array in each Node. That would be more efficient I believe.

The best way to store and access 120,000 words in java

I'm programming a java application that reads strictly text files (.txt). These files can contain upwards of 120,000 words.
The application needs to store all +120,000 words. It needs to name them word_1, word_2, etc. And it also needs to access these words to perform various methods on them.
The methods all have to do with Strings. For instance, a method will be called to say how many letters are in word_80. Another method will be called to say what specific letters are in word_2200.
In addition, some methods will compare two words. For instance, a method will be called to compare word_80 with word_2200 and needs to return which has more letters. Another method will be called to compare word_80 with word_2200 and needs to return what specific letters both words share.
My question is: Since I'm working almost exclusively with Strings, is it best to store these words in one large ArrayList? Several small ArrayLists? Or should I be using one of the many other storage possibilities, like Vectors, HashSets, LinkedLists?
My two primary concerns are 1.) access speed, and 2.) having the greatest possible number of pre-built methods at my disposal.
Thank you for your help in advance!!
Wow! Thanks everybody for providing such a quick response to my question. All your suggestions have helped me immensely. I’m thinking through and considering all the options provided in your feedback.
Please forgive me for any fuzziness; and let me address your questions:
Q) English?
A) The text files are actually books written in English. The occurrence of a word in a second language would be rare – but not impossible. I’d put the percentage of non-English words in the text files at .0001%
Q) Homework?
A) I’m smilingly looking at my question’s wording now. Yes, it does resemble a school assignment. But no, it’s not homework.
Q) Duplicates?
A) Yes. And probably every five or so words, considering conjunctions, articles, etc.
Q) Access?
A) Both random and sequential. It’s certainly possible a method will locate a word at random. It’s equally possible a method will want to look for a matching word between word_1 and word_120000 sequentially. Which leads to the last question…
Q) Iterate over the whole list?
A) Yes.
Also, I plan on growing this program to perform many other methods on the words. I apologize again for my fuzziness. (Details do make a world of difference, do they not?)
Cheers!
I would store them in one large ArrayList and worry about (possibly unnecessary) optimisations later on.
Being inherently lazy, I don't think it's a good idea to optimise unless there's a demonstrated need. Otherwise, you're just wasting effort that could be better spent elsewhere.
In fact, if you can set an upper bound to your word count and you don't need any of the fancy List operations, I'd opt for a normal (native) array of string objects with an integer holding the actual number. This is likely to be faster than a class-based approach.
This gives you the greatest speed in accessing the individual elements whilst still retaining the ability to do all that wonderful string manipulation.
Note I haven't benchmarked native arrays against ArrayLists. They may be just as fast as native arrays, so you should check this yourself if you have less blind faith in my abilities than I do :-).
If they do turn out to be just as fast (or even close), the added benefits (expandability, for one) may be enough to justify their use.
Just confirming pax assumptions, with a very naive benchmark
public static void main(String[] args)
{
int size = 120000;
String[] arr = new String[size];
ArrayList al = new ArrayList(size);
for (int i = 0; i < size; i++)
{
String put = Integer.toHexString(i).toString();
// System.out.print(put + " ");
al.add(put);
arr[i] = put;
}
Random rand = new Random();
Date start = new Date();
for (int i = 0; i < 10000000; i++)
{
int get = rand.nextInt(size);
String fetch = arr[get];
}
Date end = new Date();
long diff = end.getTime() - start.getTime();
System.out.println("array access took " + diff + " ms");
start = new Date();
for (int i = 0; i < 10000000; i++)
{
int get = rand.nextInt(size);
String fetch = (String) al.get(get);
}
end = new Date();
diff = end.getTime() - start.getTime();
System.out.println("array list access took " + diff + " ms");
}
and the output:
array access took 578 ms
array list access took 907 ms
running it a few times the actual times seem to vary some, but generally array access is between 200 and 400 ms faster, over 10,000,000 iterations.
If you will access these Strings sequentially, the LinkedList would be the best choice.
For random access, ArrayLists have a nice memory usage/access speed tradeof.
My take:
For a non-threaded program, an Arraylist is always fastest and simplest.
For a threaded program, a java.util.concurrent.ConcurrentHashMap<Integer,String> or java.util.concurrent.ConcurrentSkipListMap<Integer,String> is awesome. Perhaps you would later like to allow threads so as to make multiple queries against this huge thing simultaneously.
If you're going for fast traversal as well as compact size, use a DAWG (Directed Acyclic Word Graph.) This data structure takes the idea of a trie and improves upon it by finding and factoring out common suffixes as well as common prefixes.
http://en.wikipedia.org/wiki/Directed_acyclic_word_graph
Use a Hashtable? This will give you your best lookup speed.
ArrayList/Vector if order matters (it appears to, since you are calling the words "word_xxx"), or HashTable/HashMap if it doesn't.
I'll leave the exercise of figuring out why you would want to use an ArrayList vs. a Vector or a HashTable vs. a HashMap up to you since I have a sneaking suspicion this is your homework. Check the Javadocs.
You're not going to get any methods that help you as you've asked for in the examples above from your Collections Framework class, since none of them do String comparison operations. Unless you just want to order them alphabetically or something, in which case you'd use one of the Tree implementations in the Collections framework.
How about a radix tree or Patricia trie?
http://en.wikipedia.org/wiki/Radix_tree
The only advantage of a linked list over an array or array list would be if there are insertions and deletions at arbitrary places. I don't think this is the case here: You read in the document and build the list in order.
I THINK that when the original poster talked about finding "word_2200", he meant simply the 2200th word in the document, and not that there are arbitrary labels associated with each word. If so, then all he needs is indexed access to all the words. Hence, an array or array list. If there really is something more complex, if one word might be labeled "word_2200" and the next word is labeled "foobar_42" or some such, then yes, he'd need a more complex structure.
Hey, do you want to give us a clue WHY you want to do any of this? I'm hard pressed to remember the last time I said to myself, "Hey, I wonder if the 1,237th word in this document I'm reading is longer or shorter than the 842nd word?"
Depends on what the problem is - speed or memory.
If it's memory, the minimum solution is to write a function getWord(n) which scans the whole file each time it runs, and extracts word n.
Now - that's not a very good solution. A better solution is to decide how much memory you want to use: lets say 1000 items. Scan the file for words once when the app starts, and store a series of bookmarks containing the word number and the position in the file where it is located - do this in such a way that the bookmarks are more-or-less evenly spaced through the file.
Then, open the file for random access. The function getWord(n) now looks at the bookmarks to find the biggest word # <= n (please use a binary search), does a seek to get to the indicated location, and scans the file, counting the words, to find the requested word.
An even quicker solution, using rather more memnory, is to build some sort of cache for the blocks - on the basis that getWord() requests usually come through in clusters. You can rig things up so that if someone asks for word # X, and its not in the bookmarks, then you seek for it and put it in the bookmarks, saving memory by consolidating whichever bookmark was least recently used.
And so on. It depends, really, on what the problem is - on what kind of patterns of retreival are likely.
I don't understand why so many people are suggesting Arraylist, or the like, since you don't mention ever having to iterate over the whole list. Further, it seems you want to access them as key/value pairs ("word_348"="pedantic").
For the fastest access, I would use a TreeMap, which will do binary searches to find your keys. Its only downside is that it's unsynchronized, but that's not a problem for your application.
http://java.sun.com/javase/6/docs/api/java/util/TreeMap.html

Categories

Resources