Find relationship between gps data - java

10 million user gps data, structure like:
userId
startGps
endGps
one user has two gps, start point and end point. if a distance of two points from different user larger than 1km. we'll defined there users are potentially close relation.
userA startGpsA endGpsA
userB startGpsB endGpsB
function relation(userGpsA A, userGpsB B)
if(distance(A.startGps , B.startGps) > 1km || distance(A.startGps , B.endGps) > 1km || distance(A.endGps , B.endGps) > 1km)
return A,B
return null
how could i find these relation fast?

One possible algorithm use spacial 'buckets' to reduce computation time.
It will not do a special threading tricks, but will reduce a lot the amount of User to compare (depending of the size of the bucket).
The idea is to put in the same 'buckets' every user that are already not so far from each other, and create an index on 'buckets' that allow to get adjacent 'buckets' at low cost.
Let's assume we have
class User{
long userId;
GPS start;
GPS stop;
}
class GPS{
long x;
long y;
}
First we create a class for indexed User :
class BucketEntity implements Comparable<BucketEntity>{
User origin;
long x;
long y
}
class Bucket extends Set<BucketEntity {
}
For each User we will create two BucketEntity, one for 'start' and one for 'end'. We will store thoses BucketEntity into a specialy indexed data structure that allow easy retrival of nearest other BucketEntity.
class Index extends ConcurrentHashMap<BucketEntity,Bucket> {
// Overload the 'put' implementation to correctly manage the Bucket (null initialy, etc...)
}
All we need is to implements 'hash' (and 'equals' method in the BucketEntity class. The specification for 'hash' and 'equals' is to be the same if for two BucketEntity if they are not so far from each other. We also want to be able to compute the 'hash' function of Bucket that are spacially adjacent to another Bucket, for a given BucketEntity.
To get the correct behavior for 'hash' and 'equals' a nice/fast solution is to do 'precision reduction'. In short if you have 'x = 1248813' you replace it by 'x=124' (divide by 1000) it like changing your gps-meter precision to a gps-kilometer precision.
public static long scall = 1000;
boolean equals(BucketEntity that)
{
if (this == that) return true;
if (this.x / scall == that.x / scall &&
this.y / scall == that.y / scall)
return true;
return false;
}
// Maybe an 'int' is not enough to correctly hash your data
// if so you have to create you own implementation of Map
// with a special "long hashCode()" support.
int hashCode()
{
// We put the 'x' bits in the high level, and the 'y' bits in the low level.
// So the 'x' and 'y' don't conflict.
// Take extra-care of the value of 'scall' relatively to your data and the max value of 'int'. scall == 10000 should be a maximum.
return (this.x / scall) * scall + (this.y / scall);
}
As you can see in the hashCode() method, Bucket that are close to each other have really near hashCode(), if I give you a Bucket, you can also compute the spacially adjacents Bucket hashCode().
Now you can get BucketEntity(ies) that are in the same Bucket as your given BucketEntity. To get the adjacent bucket you need to create 9 virtual BucketEntity to 'get()' Bucket/null that are around the Bucket of your BucketEntity.
List<BucketEntity> shortListToCheck = // A List not a Set !
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1 , (y/scall)+1 )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1 , (y/scall) )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1 , (y/scall)-1 )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1 , (y/scall)+1 )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall) , (y/scall) )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1 , (y/scall)-1 )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1 , (y/scall)+1 )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1 , (y/scall) )));
shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1 , (y/scall)-1 )));
get() all Buckets that match the 9 virtual BucketEntry(can be null).
For each of the User of the given 9 Buckets, really compute the distance by the way you provide in your question.
Then play with the 'scall'. Has you can see, there is no real constraint on multi-threading here. Maybe the next level of algorithm optimization is the adaptative/recursive bucket-size based on an adaptative scaling-size.

Related

Design ideas needed for bunch of conditional if-else

I have a simple logic to implement. But not sure if there is a better way to design it, other than simple if-else or switch statements.
There are 4 permissions (consider them boolean variable), which can be true or false. Based on various conditions (permutations of those permissions), i need to return list of String values that need to be displayed on UI for a dropdown field.
So its like this for now -
if(!permission1 && !permission2){return list_of_strings_1;}
else if (permission1 && permission2 && !permission3){return list_of_strings_2;}
and so on. Some of them are just if statements. So multiple conditions maybe true and we have to collect all the list of strings and display them.
Those if elses go on for quite some time (about 100 lines). Each will return different list of strings. Most of it is NOT likely to change in future. So maybe too deep of a design maybe an overkill.
But just wondering how experts would refactor this code (or if they will even refactor it or not). Maybe sticking to switch/if-else is ok?
I don't understand how four flags gives you 100 lines of code. This can be done with a map of 16 entries (or less, if some combinations are invalid and can be mapped to a default). If the string representation is truly a list of strings, one for each possible permission, the solution is even more compact.
The key is an object representing the combination of permissions, and the value is the string representation for that combination. You could create a custom type for the key, but in this example, I'm just using four bits of an integer, where each bit indicates whether the permission is granted or not:
private static final int P1 = 1 << 0, P2 = 1 << 1, P3 = 1 << 2, P4 = 1 << 3;
private static final Map<Integer, String> permissionsToString = Map.ofEntries(
Map.entry( 0, "No permissions granted."),
Map.entry( 1, "Permissions 2-3 revoked."),
Map.entry( 2, "Permission 2 granted."),
...
Map.entry(14, "Permission 1 revoked"),
Map.entry(15, "Superuser"));
public static String toString(boolean p1, boolean p2, boolean p3, boolean p4) {
int key = (p1 ? 0 : 1) << 0
| (p2 ? 0 : 1) << 1
| (p3 ? 0 : 1) << 2
| (p4 ? 0 : 1) << 3;
return permissionsToString.get(key);
}
If you don't understand bits, you can use an EnumSet or define your own value object to represent the key at a higher level. The idea is the same: map all possible combinations (24 = 16) to their corresponding label.

Catching ArrayIndexOutOfBoundsException in backward?

I creating pacman game. I have array of size 15x15, total 225fields. When I move from 255 to i.e.256, I got ArrayIndexOutOfBoundsException, this makes sense. So I can catch it and do some operation, lets say I set new starting point of pacman. But if I go from field 75 to 74 nothing happened.
So I asking, can I somehow catch this and do some operation, like I mention above.
You should not rely on ArrayIndexOutOfBoundsException for normal logic. This exception is an indication of a programming error.
Instead, you should check the index before incrementing it:
if (currentIndex == 255) {
// "special logic"
} else {
// "usual logic"
}
This way you can also handle any "special" indexes, e.g.
if ((currentIndex + 1) % 15 == 0) {
// "special logic"
} else {
// "usual logic"
}
Another point: consider using two indexes - x and y - if you are programming a 2-D game.
Every move modifies x and/or y, which can easily "wrap around" like in pacman (e.g. 13 -> 14 -> 15 -> 1 -> 2 -> ...).
And convert the (x,y)-Pair to an index only when you need to access the field element:
// Assuming that x and y are 1-based, not 0-based:
public FieldElement getFieldElementAtPosition(final int x, final int y) {
final int index = (y - 1) * FIELD_WIDTH + x - 1;
return fieldArray[index];
}

Returning Array from a generic cubicEaseInEaseOut function

I need to create a "waveform" for an Android vibration pattern. In doing so, need an array of amplitude values. That latter of which I want to create dynamically using a cubic easeInEaseOut algorithm (below is a generic function I have been working with).
The duration of the waveform/vibration is 4000ms and amplitude is b/w 0-255.
Can someone help me create this array of amplitude values?
easeInOutCubic(t, b, c, d) {
t /= d/2;
if (t < 1) return c/2*t*t*t + b;
t -= 2;
return c/2*(t*t*t + 2) + b;
}
// t=start time (0?)
// b=start value (0?),
// c=change in value (255?),
// d=duration (4000?)
Thanks in advance!

How to assume if a pedestrian crossed an intersection on OSM

I need to validate if a pedestrian crossed an intersection using GPS' readings and findNearestIntersectionOSM calls to get the nearest intersections.
For each response from geoname, i check if the distance between the 2 points is less than a certain threshold and also using the sin function, i check if the angle between the intersection(GeoPoint.BearingTo) and pedestrian's current location flips its sign
Sin(previous location reading) * Sin(Current location read) < 0
Unfortunately, this is insufficient, and i sometimes receive false positives and so on.
Is there a better approach, or anything I'm missing?
Just to make clear, I'm not planning to dive into Image Processing field, but simply use some of OSM's functionality (if possible)
private void OnClosestIntersectionPoint(GeoPoint gPtIntersection) {
int iDistance = mGeoLastKnownPosition.distanceTo(gPtIntersection);
double dbCurrentBearing = mGeoLastKnownPosition.bearingTo(gPtIntersection);
if(mDbLastKnownBearing == null) {
mDbLastKnownBearing = new Double(dbCurrentBearing);
return;
}
boolean bFlippedSignByCrossing = Math.sin(mDbLastKnownBearing) * Math.sin(dbCurrentBearing) < 0;
mDbLastKnownBearing = dbCurrentBearing; // update bearing regardless to what's going to happen
if(bFlippedSignByCrossing && iDistance <= 10 && !HasntMarkIntersectionAsCrossed(gPtIntersection))
MarkAsIntersectionCrossed(mGeoLastKnownIntersection);
}

Best choice for in memory data structure for IP address filter in Java

I have file that is CIDR format like this 192.168.1.0/24 and it is converted into this two column strucutre
3232236030 3232235777
Each string IP address convertion happens with this code:
String subnet = "192.168.1.0/24";
SubnetUtils utils = new SubnetUtils(subnet);
Inet4Address a = (Inet4Address) InetAddress.getByName(utils.getInfo().getHighAddress());
long high = bytesToLong(a.getAddress());
Inet4Address b = (Inet4Address) InetAddress.getByName(utils.getInfo().getLowAddress());
long low = bytesToLong(b.getAddress());
private static long bytesToLong(byte[] address) {
long ipnum = 0;
for (int i = 0; i < 4; ++i) {
long y = address[i];
if (y < 0) {
y += 256;
}
ipnum += y << ((3 - i) * 8);
}
return ipnum;
}
Consider that there are over 5 million entries of (low high : 3232236030 3232235777).
Also there will be intersects so the IP can originate from multiple ranges. Just the first one is more than OK.
The data is read only.
What would be the fastest way to find the range the ipToBefiltered belongs to? The structure will be entirely in memory so no database lookups.
UPDATE:
I found this Peerblock project (it has over million download so I'm thinking it must have some fast algorithms):
http://code.google.com/p/peerblock/source/browse/trunk/src/pbfilter/filter_wfp.c
Does anyone know what technique is the project using for creating the list of ranges and than searching them?
When it comes down to it I just need to know if the IP is present in any of the 5M ranges.
I would consider an n-ary tree, where n=256, and work from the dotted address rather than the converted integer.
The top level would be an array of 256 objects. A null entry means "No" there is no range that contains the address, so given your example 192.168.1.0/24 array[192] would contain an object, but array[100] might be null because no range was defined for any 100.x.x.x/n
The stored object contains a (reference to) another array[256] and a range specifier, only one of the two would be set, so 192.0.0.0/8 would end up with a range specifier indicating all addresses within that range are to be filtered. This would allow for things like 192.255.0.0/10 where the first 10 bits of the address are significant 1100 0000 11xx xxxx -- otherwise you need to check the next octet in the 2nd level array.
Initially coalescing overlapping ranges, if any, into larger ranges... e.g. 3 .. 10 and 7 .. 16 becomes 3 .. 16 ... allows this, since you don't need to associate a given IP with which range defined it.
This should require no more than 8 comparisons. Each octet is initially used directly as an index, followed by a compare for null, a compare for terminal-node (is it a range or a pointer to the next tree level)
Worst case memory consumption is theoretically 4 GB (256 ^ 4) if every IP address was in a filtering range, but of course that would coalesce into a single range so actually would be only 1 range object. A more realistic worst-case would probably be more like (256 ^ 3) or 16.7 MB. Real world usage would probably have the majority of array[256] nodes at each level empty.
This is essentially similar to Huffman / prefix coding. The shortest distinct prefix can terminate as soon as an answer (a range) is found, so often you would have averages of < 4 compares.
I would use a sorted array of int (the base address) and another array the same size (the end address). This would use 5M * 8 = 40 MB. The first IP is the base and the second IP is the last address in range. You would need to remove intersections.
To find if an address is filtered to a binary search O(log N) and if not an exact match, check it is less than (or equal to) the upper bound.
I found this binary chop algorithm in Vuze (aka azureus) project:
public IpRange isInRange(long address_long) {
checkRebuild();
if (mergedRanges.length == 0) {
return (null);
}
// assisted binary chop
int bottom = 0;
int top = mergedRanges.length - 1;
int current = -1;
while (top >= 0 && bottom < mergedRanges.length && bottom <= top) {
current = (bottom + top) / 2;
IpRange e = mergedRanges[current];
long this_start = e.getStartIpLong();
long this_end = e.getMergedEndLong();
if (address_long == this_start) {
break;
} else if (address_long > this_start) {
if (address_long <= this_end) {
break;
}
// lies to the right of this entry
bottom = current + 1;
} else if (address_long == this_end) {
break;
} else {
// < this_end
if (address_long >= this_start) {
break;
}
top = current - 1;
}
}
if (top >= 0 && bottom < mergedRanges.length && bottom <= top) {
IpRange e = mergedRanges[current];
if (address_long <= e.getEndIpLong()) {
return (e);
}
IpRange[] merged = e.getMergedEntries();
if (merged == null) {
//inconsistent merged details - no entries
return (null);
}
for (IpRange me : merged) {
if (me.getStartIpLong() <= address_long && me.getEndIpLong() >= address_long) {
return (me);
}
}
}
return (null);
}
Seems to be performing pretty well. If you know about something faster please let me know.
If you just have a CIDR address (or a list of them) and you want to check if some ipAddress is in the range of that CIDR (or list of CIDR's), just define a Set of SubnetUtils objects.
Unless you are filtering a very large N addresses, this is all String comparison and will execute extremely fast. You dont need to build a binary tree based on the higher/lower order bits and all of that complicated Jazz.
String subnet = "192.168.1.0/24";
SubnetUtils utils = new SubnetUtils(subnet);
//...
//for each subnet, create a SubnetUtils object
Set<SubnetUtils> subnets = getAllSubnets();
//...
Use a Guava Predicate to filter the ipAddresses that are not in the range of your set of subnets:
Set<String> ipAddresses = getIpAddressesToFilter();
Set<String> ipAddressesInRange =
Sets.filter(ipAddresses, filterIpsBySubnet(subnets))
Predicate<String> filterIpsBySubnet(final Set<SubnetUtils> subnets){
return new Predicate<String>() {
#Override
public boolean apply(String ipAddress) {
for (SubnetUtils subnet : subnets) {
if (subnet.getInfo().isInRange(ipAddress)) {
return true;
}
}
return false;
}
};
}
Now if the IP is in any of the Subnets, you have a nice simple filter and you dont have to build a data structure that you will have to unit test. If this is not performant enough, then go to optimization. Don't prematurely optimize :)
Here is the beginning of an answer, I'll come back when I get more freetime
Setup:
Sort the ranges by the starting number.
Since these are IP Addresses, I assume that none of the ranges overlap. If there are overlaps, you should probably run the list merging ranges and trimming unnecessary ranges (ex. if you have a range 1 - 10, you can trim the range 5 - 7).
To merge or trim do this (assume range a immediately precedes range b):
If b.end < a.end then range b is a subset of range a and you can remove range b.
If b.start < b.end and b.end > a.end then you can merge range a and b. Set a.end = b.end then remove range b.

Categories

Resources