Efficient java map for values.contains(object) in O(1)? - java

I've started writing a tool for modelling graphs (nodes and edges, both represented by objects, not adjacency matrices or simply numbers) recently to get some practice in software development next to grad school. I'd like a node to know its neighbors and the edges it's incident with. An obvious way to do this would be to take a HashSet and a HashSet. What I would like to do however is have a method
Node_A.getEdge(Node B)
that returns the edge between nodes A and B in O(1). I was thinking of doing this by replacing the two HashSets mentioned above by using one HashMap that maps the neighbors to the edges that connect a node with its neighbors. For example, calling
Node_A.hashmap.get(B)
should return the edge that connects A and B. My issue here is whether there's a way to have
HashMap.keySet().contains(Node A);
HashMap.values().contains(Edge e);
both work in O(1)? If that isn't the case with the standard libraries, are there implementations that will give me constant time for add, remove, contains, size for keySet() and values()?

Try not to over-think this.
E.g. you are saying you get the edge between A and B by Node_A.getEdge(Node B). But normally it is the A->B-information that IS considered the edge. (So you are requiring the information you should be retrieving.)
You say you want O(1), understandably, and obviously storing an adjacency-matrix will only give you O(n), so you cannot use that; but already the second-most obvious choice storing a list of adjacent nodes within each node object will give you what you want. (Note: this is different from having a global adjacency-list which would give you O(n)).

As AmitD said HashSet and HashMap alredy have constant times for for contains, get and put. For HashMap.valueSet().contains(Edge e) case you may try to look at HashBiMap in Google Guava library.

Related

Fast LinkedList search and delete in java

I am using Java's Linkedlist in my project. I have to build a delete function that removes an element with a specified unique id (id is a filed in my class) in the Linkedlist. As per the Java official document, were I to use LinkedList.remove, the runtime would be O(n) as the process happens in two steps, the first of which is a linear search with a runtime of O(n) followed by the actual delete which takes O(1).
In an attempt to speed things up, I wanted to use a binary tree for lookup, where each node in the tree is (id, reference to the node in the linkedlist). I am not exactly sure how to implement this in Java. In C/C++, one could just store a pointer as reference to the node in the linkedlist.
==
If you are wondering why I have to use LinkedList, it's because I am building an order-matching engine for exchanges. LinkedList offers superior runtime as far as insert is concerned. I am also using insertion sort to keep prices in the orderbook sorted. Priority queue does not suit my needs because I have to show the sorted order book in real time.
Have you seen the video of Stroustrup's conference talk where he showed that you should use std::vector unless you have measured a performance benefit of not using std::vector? He showed that std::vector is almost always the correct collection to use, and showed that it is faster than linked list even when inserting and deleting in the middle.
Now translate that to Java: use ArrayList unless you have measured better performance with something else.
Why is that? With modern processor architectures, there is a locality benefit: elements that you compare together, elements that you process together, are all stored next to each other in memory and are likely to be in the CPU's cache at the same time. This allows them to be fetched and written to much faster than when they're in main memory. This is not the case with a linked list, where elements are allocated individually and spread all over the place. (This locality benefit is much more pronounced in C++ where you have the actual objects next to each other, but it's still valid to a smaller extent in Java, where you have the references next to each other, albeit not the actual objects.)
Now with ArrayList, you can keep the orders sorted by price, and use binary search to insert an order in the right place.
If your performance measurement shows that LinkedList is preferable, then unfortunately Java doesn't give you access to the internal representation – the actual nodes – of the LinkedList, so you'll have to homebrew your own list.
Why are you using a List?
If you have a unique id for each object, why not put it in a Map with the id as the key? If you choose a HashMap is implementation removal is O(1). If you implement using LinkedHashMap you can preserve insertion order as well.
LinkedList insertion is superior to....what?
HashMap get/put complexity
You can easily solve this by having a small change.
First have an object that has your value and id as fields
class MyElement implements Comparable{
int id,value;
//Implement compareTo() to sort based on values
//Override equals() method to compare ids
//Override hashcode() to return the id
}
Now use a TreeSet to store these objects.
In this data structure the incoming objects get sorted and deletion and insertion also find lower time complexity of O(log n)
To preserve order by id and to get good performance use TreeMap. Put, remove and get operations will be O(log n).
EDIT:
For preserving order of insertion of elements for each id you can use TreeMap < Integer, ArrayList < T > >, i.e. for each id you can save elements with particular id in list in order of insertion.

How to efficiently remove an element from java LinkedList

I have an algorithm where I pass through nodes in a graph in a certain way, occasionally passing through the same node several times, and I need to form a list of the nodes passed, such that a node appears once for the last time I passed it.
For instance, if I passed through nodes A -> B -> C -> A -> C, the list I need in the end is [B, A, C].
What I wanted to do was to use a LinkedList, such that every node in the graph will contain a reference to its node in the LinkedList. Then, every time I pass through a node, I will remove its corresponding node from the LinkedList and insert it again into the end of the LinkedList, and the complexity of the operation will only be O(1).
However, when I began implementing this, I ran into a problem: apparently, the java class LinkedList does not allow me to see its actual list nodes. Using the regular remove functions of LinkedList to remove the list node containing a given graph node will be O(n) instead O(1), negating the whole point of using a LinkedList to begin with.
Naturally, I can implement LinkedList myself, but I would rather avoid that - it seems to me that if I have to implement LinkedList in java, I'm doing something wrong.
So, is there a way to solve this problem without implementing LinkedList on my own? Is there something that I'm missing?
As it seems, you are expecting a built-in approach, i don't think there is any Collection which provides such functionality. You will have to implement it on your own as #MartijinCourteaux suggested. Or:
use Sorted Set collection like TreeSet<E> with supporting cost of O(log n) for operations: add, remove and contains.
LinkedHashSet<E> But beware unlike HashSet<E>, LinkedHashSet can have O(1) expected performance for operations: add, contains, remove but the performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list. But we can use it without incurring the increased cost associated with TreeSet. However, insertion order is not affected if an element is re-inserted into the set so try removing the first insertion of an element before re-inserting it.
LinkedHashMap keeps order of entered values and allows remove node by its key and then put back to the end. I think that it is all you need.
Unless your linked list is large just using a regular array list will give fast performance even with the shuffling. You should also consider using hash sets, if order is not important, linked hash set if the order of insert matters, or tree set if you want it sorted. They don't allow duplicate values but have good O performance for insert, delete and contains.

Idea for a data structure to store 2D data?

I have a large 2D grid, x-by-y. The user of the application will add data about specific points on this grid. Unfortunately, the grid is far too big to be implemented as a large x-by-y array because the system on which this is running does not have enough memory.
What is a good way to implement this so that only the points that have data added to them are stored in memory?
My first idea was to create a BST of the data points. A hash function such as "(long)x<<32 + y" would be used to compare the nodes.
I then concluded that this could lose efficiency if not well balanced so I came up with the idea of having a BST of comparable BSTs of points. The outer BST would compare the inner BSTs based on their x values. The inner BSTs would compare the points by their y values (and they would all have the same x). So when the programmer wants to see if there is a point at (5,6), they would query the outer BST for 5. If an inner BST exists at that point then the programmer would query the inner BST for 6. The result would be returned.
Can you think of any better way of implementing this?
Edit: In regards to HashMaps: Most HashMaps require having an array for the lookup. One would say "data[hash(Point)] = Point();" to set a point and then find the Point by hashing it to find the index. The problem, however, is that the array would have to be the size of the range of the hash function. If this range is less than the total number of data points that are added then they would either have no room or have to be added to an overflow. Because I don't know the number of points that will be added, I would have to make an assumption that this number would be less than a certain amount and then set the array to that size. Again, this instantiates a very large array (although smaller than originally if the assumption is that there will be less data points than x*y). I would like the structure to scale linearly with the amount of data and not take up a large amount when empty.
It looks like what I want is a SparseArray, as some have mentioned. Are they implemented similarly to having a BST inside of a BST?
Edit2: Map<> is an interface. If I were to use a Map then it looks like TreeMap<> would be the best bet. So I would end up with TreeMap< TreeMap< Point> >, similar to the Map< Map< Point> > suggestions that people have made, which is basically a BST inside of a BST. Thanks for the info, though, because I didn't know that the TreeMap<> was basically the Java SDK of a BST.
Edit3: For those whom it may concern, the selected answer is the best method. Firstly, one must create a Point class that contains (x,y) and implements comparable. The Point could potentially be compared by something like (((long)x)<<32)+y). Then one would TreeMap each point to the data. Searching this is efficient because it is in a balanced tree so log(n) cost. The user can also query all of this data, or iterate through it, by using the TreeMap.entrySet() function, which returns a set of Points along with the data.
In conclusion, this allows for the space-efficient and search-efficient implementation of a sparse array, or in my case, a 2D array, that can also be iterated through efficiently.
Either a Quadtree, a k-d-tree or an R-tree.
Store index to large point array into one of the spatial structures.
Such spatial structures are advantageous if the data is not equally distributed, like geographic data that concentrates in cities, and have no point in the sea.
Think if you can forget the regular grid, and stay with the quad tree.
(Think, why do you need a regular grid? A regular grid is usually only a simplification)
Under no circumstances use Objects to store a Point.
Such an Object needs 20 bytes only for the fact that it is an object! A bad idea for a huge data set.
An int x[], and int[] y, or an int[]xy array is ideal related to memory usage.
Consider reading
Hanan Samet's "Foundations of Multidimensional Data Structures"
(at least the Introduction).
You could use a Map<Pair, Whatever> to store your data (you have to write the Pair class). If you need to iterate the data in some specific order, make Pair Comparable, and use NavigableMap
One approach could be Map<Integer, Map<Integer, Data>>. The key on the outer map is the row value, and the key in the inner map is the column value. The value associated with that inner map (of type Data in this case) corresponds to the data at (row, column). Of course, this won't help if you're looking at trying to do matrix operations or such. For that you'll need sparse matrices.
Another approach is to represent the row and column as a Coordinate class or a Point class. You will need to implement equals and hashCode (should be very trivial). Then, you can represent your data as Map<Point, Data> or Map<Coordinate, Data>.
You could have a list of lists of an object, and that object can encode it's horizontal and vertical position.
class MyClass
{
int x;
int y;
...
}
Maybe I'm being too simplistic here, but I think you can just use a regular HashMap. It would contain custom Point objects as keys:
class Point {
int x;
int y;
}
Then you override the equals method (and thus the hashCode method) to be based on x and y. That way you only store points that have some data.
I think you are on the right track to do this in a memory efficient way - it can be implemented fairly easily by using a map of maps, wrapped in a class to give a clean interface for lookups.
An alternative (and more memory efficient) approach would be to use a single map, where the key was a tuple (x,y). However, this would be less convenient if you need to make queries like 'give me all values where x == some value'.
You might want to look at FlexCompColMatrix, CompColMatrix and other sparse matrices implementations from the Matrix toolkit project.
The performance will really depends on the write/read ratio and on the density of the matrix, but if you're using a matrix package it will be easier to experiment by switching the implementation
My suggestion to you is use Commons Math: The Apache Commons Mathematics Library. Because it will save your day, by leveraging the math force that your application require.

Data-structure to "map" collections to states in a dynamic-programming algorithm

I am coding for fun an algorithm to determine the best order of constructing N Building objects. Of course, each Building has its own characteristics (such as a cost, a production, a time of construction, ...). There also exists a total ordering over the Building objects based on those characteristics.
At some point in my dynamic programming I need an adapted data structure to retrieve the best result reached so far to construct k (k<=N) Building. I need this data structure to somehow "map" a collection of the k Building (possibly sorted, since constructing Building b1 and then b2 or b2 and then b1 leaves me with the same N-k buildings but can most likely lead to different states) to the "best-state" reach so far.
I could probably use simple HashMap but it implies repeating a huge number of times collections containing the same elements, not taking into account that [b1,b2] is a sub-collection of [b1,b2,b3,b4] for instance.
I hope I made myself sufficiently clear on that one and I thank you for your help :)
It's impossible to answer without knowing the structure of your solutions.
But, if the solution for k is obtainable from the solution of (k-1) by inserting a building b into position j, then you can have simply an hashmap mapping integer i to the "delta" between the solution for i and the solution for i-1, expressed by a couple.
But having to deal explicitly with deltas can be hideous, because you need to do traversing to get a solution.
You can solve this by let the delta know the referenced deltas (i.e., passing the delta for (k-1) in the constructor of the delta for k), and then exposing a method getSolution() which performs the actual traversing.
You can extend this idea to similar solution structures.
You can use a LinkedHashSet for the key and the value is the cost of building the Buildings contained in the set in its iteration order. It's a bit of a hack but according hashCode and equals it should work. It you prefer to be more explicit go with the hashmap of set to pair of cost and build order.
If your solutions look like ABC,cost1, ABCD,cost2, create a linked list d-> c-> b-> a. Store solutions as tuples of cost and a reference to the last element contained in your Solution (the earliest in the list)

customize an indexof call for a linkedlist (java)

I'm working with a very large (custom Object) linkedlist, and I'm trying to determine if an object that I'm trying to add to the list is already in there.
The issue is that the item I am searching for is a unique object containing:
A 1st String
A 2nd String
A unique Count #
I'm trying to find out if there is an item in my linked list that contains the (1st String) and (2nd String), but ignore (the unique Count #).
This can be done the dumb way (the way I tried it first) by going through each individual linkedlist item - but this takes way too long. I'm trying to speed it up! I figured using (indexOf) would help, but I don't know how I can customize what it is searching for.
Any ideas?
indexOf() has O(n) performance as well because it progressively scans the List until it finds the element you're looking for.
Is the list sorted? If so, you might be able to search for an element using something like quicksort.
If you need constant time access for random elements, I don't think a Linked List is your best bet.
Do you NEED to use a LinkedList? If it's not legacy code, I would recommend either HashSet or LinkedHashMap. Both will give you constant-time lookup, and if you still need insertion-order iteration, LinkedHashMap has an internal LinkedList running through the keys.
Unfortunately the "dumb way" is the most effiecient way to do so, although you could use
if ( linkedList.contains(objectThatMayBeInList) ) { //do something }
The problem is that a LinkedList has a best case search of O(N) where N is the size of the list. That means that on any given search you have a worst case scenario of N computations. Linked lists are not the best data structure for that kind of an operation, but at the same time, it's not that bad, and it shouldn't be too slow, computers are good at doing that. Is there more specifics you can give us as to the size of the list?
Basically you want to find out if object A exists in linked list L. This is the search problem, and if the list is unordered you cannot do it faster than O(n).
If you kept the list sorted (making insertion slower), you could do a binary search to see if A is in the list, which would be much faster.
Perhaps you could also keep a Map (HashMap or TreeMap for instance) in addition to the list, where you keep track of what stuff is in the list.

Categories

Resources