Idea for a data structure to store 2D data? - java

I have a large 2D grid, x-by-y. The user of the application will add data about specific points on this grid. Unfortunately, the grid is far too big to be implemented as a large x-by-y array because the system on which this is running does not have enough memory.
What is a good way to implement this so that only the points that have data added to them are stored in memory?
My first idea was to create a BST of the data points. A hash function such as "(long)x<<32 + y" would be used to compare the nodes.
I then concluded that this could lose efficiency if not well balanced so I came up with the idea of having a BST of comparable BSTs of points. The outer BST would compare the inner BSTs based on their x values. The inner BSTs would compare the points by their y values (and they would all have the same x). So when the programmer wants to see if there is a point at (5,6), they would query the outer BST for 5. If an inner BST exists at that point then the programmer would query the inner BST for 6. The result would be returned.
Can you think of any better way of implementing this?
Edit: In regards to HashMaps: Most HashMaps require having an array for the lookup. One would say "data[hash(Point)] = Point();" to set a point and then find the Point by hashing it to find the index. The problem, however, is that the array would have to be the size of the range of the hash function. If this range is less than the total number of data points that are added then they would either have no room or have to be added to an overflow. Because I don't know the number of points that will be added, I would have to make an assumption that this number would be less than a certain amount and then set the array to that size. Again, this instantiates a very large array (although smaller than originally if the assumption is that there will be less data points than x*y). I would like the structure to scale linearly with the amount of data and not take up a large amount when empty.
It looks like what I want is a SparseArray, as some have mentioned. Are they implemented similarly to having a BST inside of a BST?
Edit2: Map<> is an interface. If I were to use a Map then it looks like TreeMap<> would be the best bet. So I would end up with TreeMap< TreeMap< Point> >, similar to the Map< Map< Point> > suggestions that people have made, which is basically a BST inside of a BST. Thanks for the info, though, because I didn't know that the TreeMap<> was basically the Java SDK of a BST.
Edit3: For those whom it may concern, the selected answer is the best method. Firstly, one must create a Point class that contains (x,y) and implements comparable. The Point could potentially be compared by something like (((long)x)<<32)+y). Then one would TreeMap each point to the data. Searching this is efficient because it is in a balanced tree so log(n) cost. The user can also query all of this data, or iterate through it, by using the TreeMap.entrySet() function, which returns a set of Points along with the data.
In conclusion, this allows for the space-efficient and search-efficient implementation of a sparse array, or in my case, a 2D array, that can also be iterated through efficiently.

Either a Quadtree, a k-d-tree or an R-tree.
Store index to large point array into one of the spatial structures.
Such spatial structures are advantageous if the data is not equally distributed, like geographic data that concentrates in cities, and have no point in the sea.
Think if you can forget the regular grid, and stay with the quad tree.
(Think, why do you need a regular grid? A regular grid is usually only a simplification)
Under no circumstances use Objects to store a Point.
Such an Object needs 20 bytes only for the fact that it is an object! A bad idea for a huge data set.
An int x[], and int[] y, or an int[]xy array is ideal related to memory usage.
Consider reading
Hanan Samet's "Foundations of Multidimensional Data Structures"
(at least the Introduction).

You could use a Map<Pair, Whatever> to store your data (you have to write the Pair class). If you need to iterate the data in some specific order, make Pair Comparable, and use NavigableMap

One approach could be Map<Integer, Map<Integer, Data>>. The key on the outer map is the row value, and the key in the inner map is the column value. The value associated with that inner map (of type Data in this case) corresponds to the data at (row, column). Of course, this won't help if you're looking at trying to do matrix operations or such. For that you'll need sparse matrices.
Another approach is to represent the row and column as a Coordinate class or a Point class. You will need to implement equals and hashCode (should be very trivial). Then, you can represent your data as Map<Point, Data> or Map<Coordinate, Data>.

You could have a list of lists of an object, and that object can encode it's horizontal and vertical position.
class MyClass
{
int x;
int y;
...
}

Maybe I'm being too simplistic here, but I think you can just use a regular HashMap. It would contain custom Point objects as keys:
class Point {
int x;
int y;
}
Then you override the equals method (and thus the hashCode method) to be based on x and y. That way you only store points that have some data.

I think you are on the right track to do this in a memory efficient way - it can be implemented fairly easily by using a map of maps, wrapped in a class to give a clean interface for lookups.
An alternative (and more memory efficient) approach would be to use a single map, where the key was a tuple (x,y). However, this would be less convenient if you need to make queries like 'give me all values where x == some value'.

You might want to look at FlexCompColMatrix, CompColMatrix and other sparse matrices implementations from the Matrix toolkit project.
The performance will really depends on the write/read ratio and on the density of the matrix, but if you're using a matrix package it will be easier to experiment by switching the implementation

My suggestion to you is use Commons Math: The Apache Commons Mathematics Library. Because it will save your day, by leveraging the math force that your application require.

Related

JAVA Graph/DFS implementation

I have a small dilemma I would like to be advised with -
I'm implementing a graph (directed) and I want to make it extra generic - that is Graph where T is the data in the the node(vertex).
To add a vertex to the graph will be - add(T t). The graph will wrap T to a vertex that will hold T inside.
Next I would like to run DFS on the graph - now here comes my dilemma -
Should I keep the "visited" mark in the vertex (as a member) or initiate some map while running the DFS (map of vertex -> status)?
Keeping it in the vertex is less generic (the vertex shouldn't be familiar with the DFS algo and implementation). But creating a map (vertex -> status) is very space consuming.
What do you think?
Thanks a lot!
If you need to run algorithms, especially the more complex ones, you will quickly find that you will have to associate all kinds of data with your vertices. Having a generic way to store data with the graph items is a good idea and of course the access time for reading and writing that data should be O(1), ideally. Simple implementations could be to use HashMap, which have O(1) acess time for most cases, but the factor is relatively high.
For the yFiles Graph Drawing Library they added a mechanism where the data is actually stored at the elements themselves, but you can allocate as many data slots as you like. This is similar to managing an Object[] with each element and using the index into the data array as the "map". If your graph does not change, another strategy is to store the index of the elements in the graph with the elements themselves (just the integer) and then using that index to index into an array, where for each "data map" you have basically one array the size of the number of elements. Both techniques scale very well and provide the best possible access times, unless your data is really sparse (only a fraction of the elements actually need to store the data).
The "Object[] at Elements" approach:
In your vertex and edge class, add a field of type Object[] that is package private.
Implement a Map interface that provides T getData(Vertex) and void setData(Vertex, T)
One implementation of that interface could be backed by a HashMap<Vertex,T> but the one I was suggesting actually only stores an integer index that is used to index into the Object[] arrays at the vertices.
In your graph class add a method createMap that keeps track of the used and free indices and creates a new instance of the above class whose getter and setter implementations use the package private field of the Vertex class to actually access the data
The "One Array" approach:
Add a package private integer field to your Vertex class
Keep the integer fields in sync with the order of the vertices in your graph - the first Vertex has index 0, etc.
In the alternative map implementation, you initially allocate one T[] that has the size of the
number of vertices.
In the getter and setter implementations you take the index of the Vertex and use that to access the values in the array.
For the DFS algorithm I would choose the "one array"-approach as you could use a byte[] (or if "Visited" is all that is required you could even use a BitSet) for space efficiency and you are likely to populate the data for all vertices in DFS if your graph is connected. This should perform a lot better than a HashMap based approach and does not require the boxing and unboxing for storing the data in the Object[].

Comparing player strengths and weaknesses

I'm developing a game which will have the same sort of system as pokemon does, i.e. every player will have a 'type'(fire,water,grass etc.). When players fight, I need to determine what factr to multiply attacks by, to create strengths and weaknesses. So far I'm using a switch in each 'type' class which takes another 'type' class as input and returns the multiplication factor. With only three of these 'type' classes, I'm writing a lot of ode and I can foresee it getting out of hand in the future when I want to add more.
So my question is, how can I implement a DRY solution for determining strengths and weaknesses of each type? I've attached a table of the pokemon types as a reference for what it is I am trying to do.
How about enumerating the types, and building a 2D matrix that looks just like the one you posted. Whenever you need the "factor" for a battle, look the factor up using the attacker and defender as indices in the 2D array. Lookups would be fast and the code would be pretty clean.
Sample use cases would look something like this:
factor = factorTable[FIRE][WATER]; // would set factor to 0.5
factor = factorTable[WATER][FIRE]; // would set factor to 2.0
As Noctua suggested, it might be a good idea to have the actual data in a config file. That way you can easily change it without recompiling. If you go for that option, you'd need some kind of parsing function to create the matrix at the beginning of the program.
An even better step to take next would be to encapsulate the table behavior and type representation in classes. The underlying implementation could still be the same (or change, that's the point) but you wouldn't expose the table nor the enumerations directly.
factor = StrengthFactors(Player1.Type(), Player2.Type()); // or similar
I think you should use a single array of strings to store the different types. Then you use a 2D Matrix to store multipliers. The idea is to use the id of this array of string to know where is the multiplier. You will have a O(n) complexity to find the multiplier you want.

Linearly "smoothed" function using lookup table

If I want to write a function (probably also a class) which returns linearly "smoothed" data from an immutable lookup table (fixed when calling constructor) like this:
For example func(5.0) == 0.5.
What is the best way of storing the lookup table?
I am thinking of using two arrays.
Could there be other better ways?
What is the best way of calculating the required value? (In terms of real-time efficiency, excluding preparation time)
I am thinking of pre-sorting the lookup table on arg and using a binary search to find the nearest two points.
Or should I build a binary tree to simplify searching?
Or could there be other better ways?
(Minor) What would one call this kind of function/class/data-structure/algorithms? Is there any formal names for this in Computer Science?
I think I may need to write my own class. The class, at best, should be immutable because there is no need to change it after initialization and probably multiple threads will be using it. I may also need to get keys and values by the index.
Looks like you are trying to linearly interpolate a set of points. I'd use java.util.NavigableMap. It provides functions such as higherEntry(K key) and lowerEntry(K key) which facilitates getting the neighboring points.
You put into the map your (x,y)s. When queried for f(x_i), you'd first check if the mapping is contained in your map, if it is, return it. If not you'd call higherKey(x_i) and lowerKey(x_i) to find the neighboring two points. Then use the formula to interpolate those two points (see
Wikipedia's Linear Interpolation Page).
I'd also implement the interpolation logic in a different class, and pass it in as a constructor argument to your function class in case you want to use different interpolation methods (i.e. polynomial interpolation) later.

Efficient java map for values.contains(object) in O(1)?

I've started writing a tool for modelling graphs (nodes and edges, both represented by objects, not adjacency matrices or simply numbers) recently to get some practice in software development next to grad school. I'd like a node to know its neighbors and the edges it's incident with. An obvious way to do this would be to take a HashSet and a HashSet. What I would like to do however is have a method
Node_A.getEdge(Node B)
that returns the edge between nodes A and B in O(1). I was thinking of doing this by replacing the two HashSets mentioned above by using one HashMap that maps the neighbors to the edges that connect a node with its neighbors. For example, calling
Node_A.hashmap.get(B)
should return the edge that connects A and B. My issue here is whether there's a way to have
HashMap.keySet().contains(Node A);
HashMap.values().contains(Edge e);
both work in O(1)? If that isn't the case with the standard libraries, are there implementations that will give me constant time for add, remove, contains, size for keySet() and values()?
Try not to over-think this.
E.g. you are saying you get the edge between A and B by Node_A.getEdge(Node B). But normally it is the A->B-information that IS considered the edge. (So you are requiring the information you should be retrieving.)
You say you want O(1), understandably, and obviously storing an adjacency-matrix will only give you O(n), so you cannot use that; but already the second-most obvious choice storing a list of adjacent nodes within each node object will give you what you want. (Note: this is different from having a global adjacency-list which would give you O(n)).
As AmitD said HashSet and HashMap alredy have constant times for for contains, get and put. For HashMap.valueSet().contains(Edge e) case you may try to look at HashBiMap in Google Guava library.

Data-structure to "map" collections to states in a dynamic-programming algorithm

I am coding for fun an algorithm to determine the best order of constructing N Building objects. Of course, each Building has its own characteristics (such as a cost, a production, a time of construction, ...). There also exists a total ordering over the Building objects based on those characteristics.
At some point in my dynamic programming I need an adapted data structure to retrieve the best result reached so far to construct k (k<=N) Building. I need this data structure to somehow "map" a collection of the k Building (possibly sorted, since constructing Building b1 and then b2 or b2 and then b1 leaves me with the same N-k buildings but can most likely lead to different states) to the "best-state" reach so far.
I could probably use simple HashMap but it implies repeating a huge number of times collections containing the same elements, not taking into account that [b1,b2] is a sub-collection of [b1,b2,b3,b4] for instance.
I hope I made myself sufficiently clear on that one and I thank you for your help :)
It's impossible to answer without knowing the structure of your solutions.
But, if the solution for k is obtainable from the solution of (k-1) by inserting a building b into position j, then you can have simply an hashmap mapping integer i to the "delta" between the solution for i and the solution for i-1, expressed by a couple.
But having to deal explicitly with deltas can be hideous, because you need to do traversing to get a solution.
You can solve this by let the delta know the referenced deltas (i.e., passing the delta for (k-1) in the constructor of the delta for k), and then exposing a method getSolution() which performs the actual traversing.
You can extend this idea to similar solution structures.
You can use a LinkedHashSet for the key and the value is the cost of building the Buildings contained in the set in its iteration order. It's a bit of a hack but according hashCode and equals it should work. It you prefer to be more explicit go with the hashmap of set to pair of cost and build order.
If your solutions look like ABC,cost1, ABCD,cost2, create a linked list d-> c-> b-> a. Store solutions as tuples of cost and a reference to the last element contained in your Solution (the earliest in the list)

Categories

Resources