I’m trying to work out a math / geometry problem in a Java project I’m working.
Here is the scenario:
There are two sets of blocks, each with a different number of blocks and different dimensions. In this example Set A has 5 blocks, each is 20x20 pixels; Set B has 6 blocks, and each is 25x50 pixels:
I’m trying to come up with a way to mathematically or logically determine how those sets would line up to maximize the contact between them. If you were to line these set up end-to-end it would look like this:
In this image, 4 of the blocks in set B are in contact with the blocks in set A. However, if you shift set A to the right a bit, you can get 5 of the blocks in set B to touch:
The problem is that the formula / algorithm / logic needs to be flexible enough to handle different combinations. In this example, set C has only 3 blocks, and each block is 40x40:
Any ideas?
Center the two sets of blocks and shift one of them by a small amount.
Check the difference in total length between the two sets of blocks.
If the difference is less than the length of the smaller block, then just align the two sets of blocks on one edge; they all have contact with each other, so call it good.
Otherwise, move the smaller set of blocks sideways by almost the length of one member of the other set (i.e., length of larger block minus some tiny number), to maximize contact.
It will look a bit like this (where top blocks are width 5, and bottom blocks are width 3):
111112222233333444445555566666
--->111222333444555
If you're aiming for the simpler answer of "how many blocks are in contact?", then the computation is simpler. The shorter set of blocks always has contact with the longer set of blocks. The longer set of blocks is in contact with this many blocks in the shorter set if the edges are exactly aligned:
(length of shorter set of blocks) / (length of a single member of longer block)
Add one if that's both less than the number of longer blocks, and if that fraction isn't an integer (to account for a tiny shift like I described earlier). Then round up.
It's a little difficult to come up with a good algorithm for this without understanding what the program is actually trying to do ... but ok, you want to 'maximise' the contact between two lists of blocks (or are they really sets?).
One thing that occurs to me here is that the best alignment will have at least one of the separators between blocks aligned. So you could just keep the longer list fixed, and shift the shorter one along, stepping by separator alignment.
Let a_total and b_total be the total widths of the collections of blocks. Let a_single and b_single be the width of one of the blocks. We can assume a_total <= b_total (otherwise swap).
If the rows of blocks are aligned at their left edges, A is in contact with ceiling(a_total/b_single) blocks from B. That number can be increased by at most one by shifting the starting point of A to the right. The number is at most one because the situation is periodic for large B (imagine an infinitely long B, for example): shifting A by exactly b_single results in a configuration exactly the same as the starting configuration, so another B block has been added to the end.
The trick now is to see whether we can add a B block to the end by shifting the A collection, while not removing the B block at the beginning.
We can add a B block at the end only if B is long enough; the exact condition is a_total <= b_total - b_single.
We can avoid removing a B block from the beginning if we can shift the A collection by less than b_single in order for the right edge of the A collection to pass a B block boundary, in other words if and only if ceiling(a_total/b_single)*b_single - a_total < b_single, i.e., ceiling(a_total/b_single) < (a_total + b_single)/b_single, i.e., ceiling(a_total/b_single) < a_total/b_single + 1. The latter inequality is always true.
In summary, the number of blocks in contact is maximized at ceiling(a_total/b_total) + 1 if a_total <= b_total - b_single, and ceiling(a_total/b_total) otherwise (assuming, of course, that a_total <= b_total).
There is one further issue you need to consider: the above analysis holds when you can adjust the relative positions of the blocks by any real number. If you are restricted to one pixel adjustments, then you may get into further special cases if b_single = 1, for example.
Related
Considering this code which calculates a power of a double x:
public static double F1 (double x, int k){
if (k==1) { return x; } // O(1)
return x * F1(x, k-1); // O(k)
}
I have concluded that
the nr of operations in if (k==1) { return x; } : 2 operations, the if-statement and the return-statement. Thus, T(1) = 2
the nr of operations in return x * F1(x, k-1); : 4 operations, the return-statement = 1, the *-operator = 1, and F1(x, k-1); = 2. So the first part of the equation = 4
We have one recursive call in x * F1(x, k-1), so x = 1.
We reduce the problem by 1 in each recursive call, so y = k-1. So the second part of the equation = T(k-1)
Putting this all together, we get:
4 + T(k-1), T(1) = 2
But how do I proceed from here to find the exact runtime?
I tried to look at this question for an explanation, but it focused on how to calculate the Big-O notation, and not the exact time complexity. How do I proceed to find the exact time-complexity?
The answer here should be:
Exact: 4k-2
Tilde: 4k
Big-O: O(k)
But I don't know what they did to arrive at this.
But how do I proceed from here to find the exact runtime?
You toss everything you did so far in the garbage and fire up JMH instead, see later for more on that.
It is completely impossible to determine exact runtime based on such academic analysis. Exact runtime depends on which song is playing in your music player, whether your OS is busy doing some disk cleanup, sending a ping to the network time server, which pages so happen to be on the on-die caches, which CPU core your code ends up being run on, and the phase of the moon.
Let me say this as clear as I can: Something like 4k - 2 is utterly irrelevant and misguided - that's just not how computers work. You can't say that an algorithm with 'exact runtime' 4k - 2 will be faster than a 6k + 2 algorithm. It is equally likely to be slower: It holds zero predictive power. It's a completely pointless 'calculation'. It means nothing. There's a reason big-O notation exist: That does mean something regardless of hardware vagary: Given 2 algorithms such that one has a 'better' big-O notation than the other, then there exists some input size such that the better algorithm WILL be faster, regardless of hardware concerns. It might be a really big number and big-O does nothing whatsoever to tell you at what number this occurs.
The point of big-O notation is that it dictates with mathematical certainty what will eventually happen if you change the size of the input to your algorithm, in very broad strokes. It is why you remove all constants and everything but the largest factor when showing a big-O notation.
Take a graph; on the X-axis, there's 'input size', which is the 'k' in O(k). On the Y-axis, there's execution time (or if you prefer, max. memory load). Then, make up some input size and run your algorithm a few times. Average the result, and place a dot on that graph. For example, if you are running your algorithm on an input of k=5, and it takes 27ms on average, put a dot on x=5, y=27.
Keep going. Lots of dots. Eventually those dots form a graph. The graph will, near the x=0 point, be all over the place. As if a drunk with a penchant for randomness is tossing darts at a board.
But, eventually (and when 'eventually' kicks in is impossible to determine, as, again, it depends on so many OS things, don't bother attempting to predict such things), it'll start looking like a recognizable shape. We define these shapes in terms of simplistic formulas. For example, if it eventually (far enough to the right) coalesces into something that looks like what you'd get if you graph y=x^2, then we call that O(x^2).
Now, y=5x^2 looks exactly like y=x^2. For that matter, y=158*x^2 + 25000x + 2134931239, if you look far enough to the right on that curve, looks exactly like y=x^2. Hence why O(158x^2+20x) is completely missing the point, and therefore incorrect. The point of O is merely to tell you what it'll look like 'far enough to the right'.
This leaves us with precisely 2 useful performance metrics:
O(k) notation. Which you correctly determined here: This algorithm has an O(k) runtime.
A timing report. There is no point trying to figure this out by looking at the code, you need to run the code. Repeatedly, with all sorts of guards around it to ensure that hotspot optimization isn't eliminating your code completely, re-running lots of times to get a good average, and ensuring that we're past the JVM's JIT step. You use JMH to do this, and note that the result of JMH, naturally, depends on the hardware you run it on, and that's because programs can have wildly different performance characteristics depending on hardware.
For the first k-1 steps you execute:
the comparison k==1
the subtraction k-1
the product x * ...
the return instruction
In the last step you execute:
the comparison k==1
the return instruction
So you have 4*(k-1)+2 = 4k-2 overall instructions.
EDIT: As #rzwitserloot correctly pointed out, the quantity that you are searching for is not very significant, but it depends on how the code is compiled and executed. Above I've just tried to figure out what your teacher meant with "exact time-complexity".
Ok, I need to write a java algorithm which simulates the SMOOTH function written in IDL. But I'm not quite sure how that algorithm works. The smooth equation is given by:
I know there is already a similar post regarding boxcar averaging. But the algorithm seems to be different.
What I understand in this equation is that there is two states (if statement), the first one is calculating the weight average, the second one is to ignore the boundary.
In the first equation, I think I got the summation notation, it starts from 0 to (w - 1).
What I don't get is the one inside summation Ai+j-w/2.
The following is the sample data (just corner part of large data) that was calculated using IDL. I used weight 5 to calculate this.
Please, explain me how that algorithm works.
Thanks
You want the i'th average to be from a window around the i'th point. So it has to start before that point, and end after.
Subtracting off w/2 in the index causes j=0 to be the start of the window you want, and j=w-1 to be the end of the window you want.
It would be entirely equivalent to sum from j=-w/2 to j=w/2-1 instead.
I am looking for a way to shuffle a large amount of data which does not fit into memory (approx. 40GB).
I have around 30 millions entries, of variable length, stored in one large file. I know the starting and ending positions of each entry in that file. I need to shuffle this data which does not fit in the RAM.
The only solution I thought of is to shuffle an array containing the numbers from 1 to N, where N is the number of entries, with the Fisher-Yates algorithm and then copy the entries in a new file, according to this order. Unfortunately, this solution involves a lot of seek operations, and thus, would be very slow.
Is there a better solution to shuffle large amount of data with uniform distribution?
First get the shuffle issue out of your face. Do this by inventing a hash algorithm for your entries that produces random-like results, then do a normal external sort on the hash.
Now you have transformed your shuffle into a sort your problems turn into finding an efficient external sort algorithm that fits your pocket and memory limits. That should now be as easy as google.
A simple approach is to pick a K such that 1/K of the data fits comfortably in memory. Perhaps K=4 for your data, assuming you've got 16GB RAM. I'll assume your random number function has the form rnd(n) which generates a uniform random number from 0 to n-1.
Then:
for i = 0 .. K-1
Initialize your random number generator to a known state.
Read through the input data, generating a random number rnd(K) for each item as you go.
Retain items in memory whenever rnd(K) == i.
After you've read the input file, shuffle the retained data in memory.
Write the shuffled retained items to the output file.
This is very easy to implement, will avoid a lot of seeking, and is clearly correct.
An alternative is to partition the input data into K files based on the random numbers, and then go through each, shuffling in memory and writing to disk. This reduces disk IO (each item is read twice and written twice, compared to the first approach where each item is read K times and written once), but you need to be careful to buffer the IO to avoid a lot of seeking, it uses more intermediate disk, and is somewhat more difficult to implement. If you've got only 40GB of data (so K is small), then the simple approach of multiple iterations through the input data is probably best.
If you use 20ms as the time for reading or writing 1MB of data (and assuming the in-memory shuffling cost is insignificant), the simple approach will take 40*1024*(K+1)*20ms, which is 1 minute 8 seconds (assuming K=4). The intermediate-file approach will take 40*1024*4*20ms, which is around 55 seconds, assuming you can minimize seeking. Note that SSD is approximately 20 times faster for reads and writes (even ignoring seeking), so you should expect to perform this task in well under 10s using an SSD. Numbers from Latency Numbers every Programmer should know
I suggest keeping your general approach, but inverting the map before doing the actual copy. That way, you read sequentially and do scattered writes rather than the other way round.
A read has to be done when requested before the program can continue. A write can be left in a buffer, increasing the probability of accumulating more than one write to the same disk block before actually doing the write.
Premise
From what I understand, using the Fisher-Yates algorithm and the data you have about the positions of the entries, you should be able to obtain (and compute) a list of:
struct Entry {
long long sourceStartIndex;
long long sourceEndIndex;
long long destinationStartIndex;
long long destinationEndIndex;
}
Problem
From this point onward, the naive solution is to seek each entry in the source file, read it, then seek to the new position of the entry in the destination file and write it.
The problem with this approach is that it uses way too many seeks.
Solution
A better way to do it, is to reduce the number of seeks, using two huge buffers, for each of the files.
I recommend a small buffer for the source file (say 64MB) and a big one for the destination file (as big as the user can afford - say 2GB).
Initially, the destination buffer will be mapped to the first 2GB of the destination file. At this point, read the whole source file, in chunks of 64MB, in the source buffer. As you read it, copy the proper entries to the destination buffer. When you reach the end of the file, the output buffer should contain all the proper data. Write it to the destination file.
Next, map the output buffer to the next 2GB of the destination file and repeat the procedure. Continue until you have wrote the whole output file.
Caution
Since the entries have arbitrary sizes, it's very likely that at the beginning and ending of the buffers you will have suffixes and prefixes of entries, so you need to make sure you copy the data properly!
Estimated time costs
The execution time depends, essentially, on the size of the source file, the available RAM for the application and the reading speed of the HDD. Assuming a 40GB file, a 2GB RAM and a 200MB/s HDD read speed, the program will need to read 800GB of data (40GB * (40GB / 2GB)). Assuming the HDD is not highly fragmented, the time spent on seeks will be negligible. This means the reads will take up one hour! But if, luckily, the user has 8GB of RAM available for your application, the time may decrease to only 15 to 20 minutes.
I hope this will be enough for you, as I don't see any other faster way.
Although you can use external sort on a random key, as proposed by OldCurmudgeon, the random key is not necessary. You can shuffle blocks of data in memory, and then join them with a "random merge," as suggested by aldel.
It's worth specifying what "random merge" means more clearly. Given two shuffled sequences of equal size, a random merge behaves exactly as in merge sort, with the exception that the next item to be added to the merged list is chosen using a boolean value from a shuffled sequence of zeros and ones, with exactly as many zeros as ones. (In merge sort, the choice would be made using a comparison.)
Proving it
My assertion that this works isn't enough. How do we know this process gives a shuffled sequence, such that every ordering is equally possible? It's possible to give a proof sketch with a diagram and a few calculations.
First, definitions. Suppose we have N unique items, where N is an even number, and M = N / 2. The N items are given to us in two M-item sequences labeled 0 and 1 that are guaranteed to be in a random order. The process of merging them produces a sequence of N items, such that each item comes from sequence 0 or sequence 1, and the same number of items come from each sequence. It will look something like this:
0: a b c d
1: w x y z
N: a w x b y c d z
Note that although the items in 0 and 1 appear to be in order, they are just labels here, and the order doesn't mean anything. It just serves to connect the order of 0 and 1 to the order of N.
Since we can tell from the labels which sequence each item came from, we can create a "source" sequence of zeros and ones. Call that c.
c: 0 1 1 0 1 0 0 1
By the definitions above, there will always be exactly as many zeros as ones in c.
Now observe that for any given ordering of labels in N, we can reproduce a c sequence directly, because the labels preserve information about the sequence they came from. And given N and c, we can reproduce the 0 and 1 sequences. So we know there's always one path back from a sequence N to one triple (0, 1, c). In other words, we have a reverse function r defined from the set of all orderings of N labels to triples (0, 1, c) -- r(N) = (0, 1, c).
We also have a forward function f from any triple r(n) that simply re-merges 0 and 1 according to the value of c. Together, these two functions show that there is a one-to-one correspondence between outputs of r(N) and orderings of N.
But what we really want to prove is that this one-to-one correspondence is exhaustive -- that is, we want to prove that there aren't extra orderings of N that don't correspond to any triple, and that there aren't extra triples that don't correspond to any ordering of N. If we can prove that, then we can choose orderings of N in a uniformly random way by choosing triples (0, 1, c) in a uniformly random way.
We can complete this last part of the proof by counting bins. Suppose every possible triple gets a bin. Then we drop every ordering of N in the bin for the triple that r(N) gives us. If there are exactly as many bins as orderings, then we have an exhaustive one-to-one correspondence.
From combinatorics, we know that number of orderings of N unique labels is N!. We also know that the number of orderings of 0 and 1 are both M!. And we know that the number of possible sequences c is N choose M, which is the same as N! / (M! * (N - M)!).
This means there are a total of
M! * M! * N! / (M! * (N - M)!)
triples. But N = 2 * M, so N - M = M, and the above reduces to
M! * M! * N! / (M! * M!)
That's just N!. QED.
Implementation
To pick triples in a uniformly random way, we must pick each element of the triple in a uniformly random way. For 0 and 1, we accomplish that using a straightforward Fisher-Yates shuffle in memory. The only remaining obstacle is generating a proper sequence of zeros and ones.
It's important -- important! -- to generate only sequences with equal numbers of zeros and ones. Otherwise, you haven't chosen from among Choose(N, M) sequences with uniform probability, and your shuffle may be biased. The really obvious way to do this is to shuffle a sequence containing an equal number of zeros and ones... but the whole premise of the question is that we can't fit that many zeros and ones in memory! So we need a way to generate random sequences of zeros and ones that are constrained such that there are exactly as many zeros as ones.
To do this in a way that is probabilistically coherent, we can simulate drawing balls labeled zero or one from an urn, without replacement. Suppose we start with fifty 0 balls and fifty 1 balls. If we keep count of the number of each kind of ball in the urn, we can maintain a running probability of choosing one or the other, so that the final result isn't biased. The (suspiciously Python-like) pseudocode would be something like this:
def generate_choices(N, M):
n0 = M
n1 = N - M
while n0 + n1 > 0:
if randrange(0, n0 + n1) < n0:
yield 0
n0 -= 1
else:
yield 1
n1 -= 1
This might not be perfect because of floating point errors, but it will be pretty close to perfect.
This last part of the algorithm is crucial. Going through the above proof exhaustively makes it clear that other ways of generating ones and zeros won't give us a proper shuffle.
Performing multiple merges in real data
There remain a few practical issues. The above argument assumes a perfectly balanced merge, and it also assumes you have only twice as much data as you have memory. Neither assumption is likely to hold.
The fist turns out not to be a big problem because the above argument doesn't actually require equally sized lists. It's just that if the list sizes are different, the calculations are a little more complex. If you go through the above replacing the M for list 1 with N - M throughout, the details all line up the same way. (The pseudocode is also written in a way that works for any M greater than zero and less than N. There will then be exactly M zeros and M - N ones.)
The second means that in practice, there might be many, many chunks to merge this way. The process inherits several properties of merge sort — in particular, it requires that for K chunks, you'll have to perform roughly K / 2 merges, and then K / 4 merges, and so on, until all the data has been merged. Each batch of merges will loop over the entire dataset, and there will be roughly log2(K) batches, for a run time of O(N * log(K)). An ordinary Fisher-Yates shuffle would be strictly linear in N, and so in theory would be faster for very large K. But until K gets very, very large, the penalty may be much smaller than the disk seeking penalties.
The benefit of this approach, then, comes from smart IO management. And with SSDs it might not even be worth it — the seek penalties might not be large enough to justify the overhead of multiple merges. Paul Hankin's answer has some practical tips for thinking through the practical issues raised.
Merging all data at once
An alternative to doing multiple binary merges would be to merge all the chunks at once -- which is theoretically possible, and might lead to an O(N) algorithm. The random number generation algorithm for values in c would need to generate labels from 0 to K - 1, such that the final outputs have exactly the right number of labels for each category. (In other words, if you're merging three chunks with 10, 12, and 13 items, then the final value of c would need to have 0 ten times, 1 twelve times, and 2 thirteen times.)
I think there is probably an O(N) time, O(1) space algorithm that will do that, and if I can find one or work one out, I'll post it here. The result would be a truly O(N) shuffle, much like the one Paul Hankin describes towards the end of his answer.
Logically partition your database entries (for e.g Alphabetically)
Create indexes based on your created partitions
build DAO to sensitize based on index
I'm trying to implement a poisson solver for image blending in Java. After descretization with 5-star method, the real work begins.
To do that i do these three steps with the color values:
using sine transformation on rows and columns
multiply eigenvalues
using inverse sine transformation on rows an columns
This works so far.
To do the sine transformation in Java, i'm using the Apache Commons Math package.
But the FastSineTransformer has two limitations:
first value in the array must be zero (well that's ok, number two is the real problem)
the length of the input must be a power of two
So right now my excerpts are of the length 127, 255 and so on to fit in. (i'm inserting a zero in the beginning, so that 1 and 2 are fulfilled) That's pretty stupid, because i want to choose the size of my excerpt freely.
My Question is:
Is there a way to extend my array e.g. of length 100 to fit the limitations of the Apache FastSineTransformer?
In the FastFourierTransfomer class it is mentioned, that you can pad with zeros to get a power of two. But when i do that, i get wrong results. Perhaps i'm doing it wrong, but i really don't know if there is anything i have to keep in mind, when i'm padding with zeros
As far as I can tell from http://books.google.de/books?id=cOA-vwKIffkC&lpg=PP1&hl=de&pg=PA73#v=onepage&q&f=false and the sources http://grepcode.com/file/repo1.maven.org/maven2/org.apache.commons/commons-math3/3.2/org/apache/commons/math3/transform/FastSineTransformer.java?av=f
The rules are as follows:
According to implementation the dataset size should be a power of 2 - presumable in order for algorithm to guarantee O(n*log(n)) execution time.
According to James S. Walker function must be odd, that is the mentioned assumptions must be fullfiled and implementation trusts with that.
According to implementation for some reason the first and the middle element must be 0:
x'[0] = x[0] = 0,
x'[k] = x[k] if 1 <= k < N,
x'[N] = 0,
x'[k] = -x[2N-k] if N + 1 <= k < 2N.
As for your case when you may have a dataset which is not a power of two I suggest that you can resize and pad the gaps with zeroes with not violating the rules from the above. But I suggest referring to the book first.
ADDED INFO:
I'm using the inside of a square as an arena. On start up, the square spawns in a random position, and rotation, and I can't access any of the squares attributes.
I then have a moving object inside the square, that I'm building AI for, and I want the object to 'learn' where the arena walls are. Every time the object bumps into a wall, I get a touch return, so I know if its hit or not. I'm using this to map the global position of where the object hit the wall and save it ... After 3 hits on the same wall, I want to mathematically 'draw a straight line' down those dots which will represent the arena walls - with this, I can tell my object not to go near these coordinates.
The reason for 3 dots? Well, if the object hit one side of a wall, then was to hit another side of a wall, I will have a line drawn from one side to another, giving false data about where the wall is.
If Java sees three (or more) dots inline, it knows that the object has hit the same wall (either being further up or so).
CONTINUED:
I'm trying to map out lines with given coordinate data. Basically I have an array that holds X and Y coordinates, and I want to be able to mathematically detect if it's they make up a straight line (give or take a few pixels). (The coordinate are a boarder of a square)
For example, the array might be like this:
[x0][y0] - 1,1
[x1][y1] - 2,2
[x2][y2] - 5,5
Which will present a diagonal line of on side of the square, like so:
But sometimes I might get one coordinate of one side of the square, and then another side, all mixed up (and not necessarily on a 90 degree angle either!). So I want to be able to run through the array, and detect what coordinates make a line (or the boarder side of the square), like so:
So right now, I have a 2D array:
private double wallLocations[][] = new double[10][10];
and a while loop that doesn't do the job. I don't really know where to even start with this one:
for(int r = 0; r < wallIndex; r++){
for(int c = 0; c < wallIndex; c++){
int index = 0;
if(wallLocations[r][index] == wallLocations[r][c] && wallLocations[r][index + 1] == wallLocations[r][c] &&
wallLocations[r][index + 2] == wallLocations[r][c]){
System.out.println("***** Wall Here! *****");
index++;
}
}
}
---- UPDATE ----
Heres a better example in what I'm looking for. The red dots represent the coordinates coming in, a line is detected when 3 or more dots line up (if it was 2 dots, then it would detect any and ever dot) ... You notice that this is starting to look like the boarder of a square?
This seems to essentially be a clustering problem, and those can be generally pretty hard. Part of a reason clustering is hard is that there may be more than one applicable mapping.
For instance (please forgive my bad ascii art):
X X X
X X X
X X X X
could be mapped
X---X---X X X X
\ / \ / \
X---X---X or X X X
/ \ / \ \
X---X---X---X X X X X
I've seen uses of the Expectation Maximization algorithm using mixed Gaussian models used for this kind of thing (when there were a lot of points but only a few expected lines) but you generally do have to give that algorithm a definite number of clusters, and while its results are good, it's a pretty slow algorithm requiring possibly many iterations. I'm kinda thinking I've seen something generally faster that's some sort of image processing algorithm but I'd have to do some research.
I'm kinda wondering about something where you find y=mx+b for every pair of points and them sort them over m and b. It might be advantageous to find the angle θ in [0,pi) for each pair instead and sort over the angles instead of m, or maybe beter cluster by cos(2θ)-- the point of that being that the group of lines {y= -0.0001x + 1, y =1, and y=0.0001x + 1} are very similar, the group of lines {y= -10000x + 10, x = 0, and y=10000x - 10} are also very similar, but cos(2θ) should put them as far apart as possible, as any two pairs between each group should be nearly perpendicular.
Also note, in my example, b doesn't matter much for the lines nearly perpendicular to the x axis, so "b" might not be so useful for direct clustering.
I guess, perhaps, there may be some utilizable measure of "distance" between two lines, I'm just not sure what it would be. Two lines that are nearly parallel that converge "on screen" (where the points generally are) might ought to be considered "closer" than if they converge a trillion units a way from the screen--or should they? Purely speaking, three lines can never pairwise be considered closer to one another if none of them are parallel (If they're on a plane, they'll all meet somewhere), but intuitively, if we have two lines that are generally one inch apart in the area we're concerned with, we'd pick that pair as closer over two identically pointed lines that are a mile apart in the area of concern. That makes me think maybe the area between the lines,as bound by our area* ought to be used as a metric.
Sorry, I'm not sure how useful all that brainstorming might be, but it might put a different light on things.
Edit: You know what, a better answer might possibly be found by researching this:
http://en.wikipedia.org/wiki/Hough_transform
Edit II & III:
Ok,... the situation you've just described is a lot simpler and less generic (although, to be honest, I think I misread your initial query to be more generic than it really was).
You've got 4 candidate walls. Let your AI bounce around until it finds three co-linear points. That should be a simple test of combinations. Assign those three points a wall. Depending upon what other points you have, you actually might be able to determine or at least estimate the other three walls (assuming it is a square). If you have 5 points with 3 on separate walls, you should be able to calculate the distance between walls, and therefore the likely position of the 4th wall. To test if the other two points are on separate walls, make sure they're pair-wise not co-linear with a line perpendicular or parallel to the line defined by your wall, or if they are on a line parallel, test to see if the distance between them is less than the distance between the wall and them (if that's the case, they're on the wall opposite of the first candidate wall). Given that they are on separate walls, either one is facing the first found wall, or they're on walls perpendicular to that wall. Either way you can find the lines defining the walls with a little tricky geometry.
(and actually, to determine the dimensions, I don't think you need to even test to see that you have 3 co-linear points... I think you just need to test to see that you've made two turns... which takes 4 points minimum but likely more if you're unlucky. two of the points would have to be determinable to be on a different wall from the other two, which means really big bounces!)
There's a bit of math involved, and I'm a bit too tired to explain further tonight, and I don't know how much of the geometry of points around a square you want to take advantage of, because you wouldn't be able to use those properties in a more general case, so I'll leave it at that, and maybe also remove some of my other previous brainstorm cruft later.
If you have two points you can calculate the slope of the connecting line with Math.atan2.