I have an interface for an object factory that creates graphs from a collection of objects given a vertex creation Function<Object,Vertex> and a linking BiPredicate<Vertex,Vertex>.
This design allows for the specification of arbitrary graph connectivity algorithms by supplying both of these functions, but as far as I've been able to implement it, this comes at the cost of having to loop over all pairs of objects in the input collection like this (classes Graph and Vertex are defined elsewhere):
Function<Object,Vertex> maker; // defined by user.
BiPredicate<Vertex,Vertex> linker; // defined by user.
Graph makeGraph( Collection<Object> input ) {
Graph g = new Graph();
Collection<Vertex> vertices = input.stream.map( ( Objec t ) -> maker.apply( t ) ).collect( Collectors.toList() );
for( Vertex ego : vertices ) {
Collection<Vertex> alters = new ArrayList<>();
alters.addAll( vertices );
alters.remove( ego );
for( Vertex alter : alters ) {
if( linker.test( ego, alter ) ) {
g.makeEdge( ego, alter );
}
}
}
}
I actually have two questions:
is there a more elegant way of iterating over all possible pairs (i,j) in a collection than my ugly solution of creating a new list, copying everything and removing i from the copy?
can anybody think of a way of optimizing that double iteration? Right now the execution time for this is O( n^2 ) in the best case, because the implementation needs to accept a linking function without any knowledge about it, but maybe there are ways around this? e.g. specifying certain parameters to indicate, for example, that the iteration can break after the first failure of the linker test for a co-occurrence network, etc.
Of course, if anyone can think of an alternative way of going about this, I'd be happy to hear it.
EDIT:
Forget the first question, Robert Navado's answer made me realize that I was wrong.
In order to clarify then: I am looking for a way of telling an implementation that the application of the linker function can be optimized under certain conditions (e.g. In the co-occurrence example mentioned above, "sort by position and break after first negative result").
Well, until you can have unlinked vertexes in your graph in the graph and your graph is sparse, I would suggest to store edges rather than Vertexes.
However maximum number of edges in single-linked clique is V*(V-1). So in worse case you'll need O(V^2) iterations to link your graph and even more for multigraph.
As for iterations syntax the following should work as well:
for(Vertex alter : vertices )
for(Vertex ego : vertices ){
//Do the descision
}
Take a look on the JUNG library for graph manipulation. It's probably outdated, but you can take a look at their data structures for inspiration.
Related
Given a tree-structured TinkerPop graph with vertices connected by labeled parent-child relationships ([parent-PARENT_CHILD->child]), what's the idiomatic way to traverse and find all those nodes?
I'm new to graph traversals, so it seems more or less straightforward to traverse them with a recursive function:
Stream<Vertex> depthFirst(Vertex v) {
Stream<Vertex> selfStream = Stream.of(v);
Iterator<Vertex> childIterator = v.vertices(Direction.OUT, PARENT_CHILD);
if (childIterator.hasNext()) {
return selfStream.appendAll(
Stream.ofAll(() -> childIterator)
.flatMap(this::depthFirst)
);
}
return selfStream;
}
(N.b. this example uses Vavr streams, but the Java stream version is similar, just slightly more verbose.)
I assume a graph-native implementation would be more performant, especially on databases other than the in-memory TinkerGraph.
However, when I look at the TinkerPop tree recipes, it's not obvious what combination of repeat() / until() etc. is the right one to do what I want.
If I then want to find only those vertices (leaf or branch) having a certain label, again, I can see how to do it with the function above:
Stream<Vertex> nodesWithMyLabel = depthFirst(root)
.filter(v -> "myLabel".equals(v.label()));
but it's far from obvious that this is efficient, and I assume there must be a better graph-native approach.
If you are using TinkerPop, it is best to just write your traversals with Gremlin. Let's use the tree described in the recipe:
g.addV().property(id, 'A').as('a').
addV().property(id, 'B').as('b').
addV().property(id, 'C').as('c').
addV().property(id, 'D').as('d').
addV().property(id, 'E').as('e').
addV().property(id, 'F').as('f').
addV().property(id, 'G').as('g').
addE('hasParent').from('a').to('b').
addE('hasParent').from('b').to('c').
addE('hasParent').from('d').to('c').
addE('hasParent').from('c').to('e').
addE('hasParent').from('e').to('f').
addE('hasParent').from('g').to('f').iterate()
To find all the children of "A", you simply do:
gremlin> g.V('A').repeat(out()).emit()
==>v[B]
==>v[C]
==>v[E]
==>v[F]
The traversal above basically says, "Start at 'A" vertex and traverse on out edges until there are no more, and oh, by the way, emit each of those child vertices as you go." If you want to also get the root of "A" then you just need to switch things around a bit:
gremlin> g.V('A').emit().repeat(out())
==>v[A]
==>v[B]
==>v[C]
==>v[E]
==>v[F]
Going a step further, if you want to emit only certain vertices based on some filter (in your question you specified label) you can just provide a filtering argument to emit(). In this case, I only emit those vertices that have more than one incoming edge:
gremlin> g.V('A').emit(inE().count().is(gt(1))).repeat(out())
==>v[C]
==>v[F]
Here's what I ended up with, after a certain amount of trial and error:
GraphTraversal<Vertex, Vertex> traversal =
graph.traversal().V(parent)
.repeat(out(PARENT_CHILD)) // follow only edges labeled PARENT_CHILD
.emit()
.hasLabel("myLabel"); // filter for vertices labeled "myLabel"
Note that this is slightly different from the recursive version in the original question since I realized I don't actually want to include the parent in the result. (I think, from the Repeat Step docs, that I could include the parent by putting emit() before repeat(), but I haven't tried it.)
First of all, I'm dealing with graphs more than 1000 edges and I'm traversing adjacency lists, as well as vertices more than 100 times per second. Therefore, I really need an efficient implementation that fits my goals.
My vertices are integers and my edges are undirected, weighted.
I've seen this code.
However, it models the adjacency lists using edge objects. Which means I have to spend O(|adj|) of time when I'd like to get the adjacents of a vertex, where |adj| is the cardinality of its adjacents.
On the other hand, I'm considering to model my adjacency lists using Map<Integer, Double>[] adj.
By using this method, I would just use adj[v], v being the vertex, and get the adjacents of the vertex to iterate over.
The other method requires something like:
public Set<Integer> adj(int v)
{
Set<Integer> adjacents = new HashSet<>();
for(Edge e: adj[v])
adjacents.add(e.other(v));
return adjacents;
}
My goals are:
I want to sort a subset of vertices by their connectivities (number of adjacents) any time I want.
Also, I need to sort the adjacents of a vertex, by the weights of the edges that connect itself and its neighbors.
I want to do these without using so much space that slows down the operations. Should I consider using an adjacency matrix?
I've used th JGrapht for a library for a variety of my own graph representations. They have a weighted graph implementation here: http://jgrapht.org/javadoc/org/jgrapht/graph/SimpleWeightedGraph.html
That seems to handle a lot of what you are looking for, and I've used it to represent graphs with up to around 2000 vertices, and it handles reasonably well for my needs, though I don't remember my access rate.
I have some objects in an ArrayList, and I want to perform collision detection and such. Is it okay to do something like:
List<Person> A;
iterA0 = A.iterator();
while (iterA0.hasNext()) {
Person A = iterA.next();
iterA1 = A.iterator();
while (iterA1.hasNext()){
Person B = iterA1.next();
A.getsDrunkAndRegretsHookingUpWith(B);
}
}
That's gotta be terrible coding, right? How would I perform this nested iteration appropriately?
You can iterate over the same list multiple times concurrently, as long as it's not being modified during any of the iterations. For example:
List<Person> people = ...;
for (Person a : people) {
for (Person b : people)
a.getsDrunkAndRegretsHookingUpWith(b);
}
As long as the getsDrunkAndRegretsHookingUpWith method doesn't change the people list, this is all fine.
This is an example of the classic handshake problem. In a room full of n people, there are n choose 2 different possible handshakes. You can't avoid the quadratic runtime.
#Chris' answer shows you a better way to code it.
Re: OP comment
What I've been cooking up is some code where an event causes a particle to explode, which causes all nearby particles to explode...chain reaction. The objects are stored in one list and the non-exploded particles only explode if they are within a defined radius of an exploding ones. So I could dish up some conditionals to make it a bit faster, but still need the n^2 traversal.
You should not use a list to store the particles. If you're modeling particles in 2 dimentions, use a quadtree. If 3 dimensions, an octree.
Number of iterations can be reduced if in you case:
A.getsDrunkAndRegretsHookingUpWith(B) implies B.getsDrunkAndRegretsHookingUpWith(A) too,
and A.getsDrunkAndRegretsHookingUpWith(A) will always be same for all elements
then instead of using iterator or foreach, you can take more traditional approach and exclude collision with itself and with the elements with withc comparison has already taken place.
List<Person> people;
for (int i=0; i<people.size(); i++){
a = people.get(i);
for (int j=i+1; j<people.size();j++){ // compare with next element and onwards in the list
a.getsDrunkAndRegretsHookingUpWith(people.get(j)); // call method
}
}
That's gotta be terrible coding, right?
Most people would probably agree that if you were using Java 5+, and there was no particular need to expose the iterator objects, then you should simplify the code by using a "for each" loop. However, this should make no difference to the performance, and certainly not to the complexity.
Exposing the iterators unnecessarily is certainly not terrible programming. On a scale of 1 to 10 of bad coding style, this is only a 1 or 2. (And anyone who tells you otherwise hasn't seen any truly terrible coding recently ... )
So how to do n^2 over oneself?
Your original question is too contrived to be able to give an answer to that question.
Depending on what the real relation is, you may be able to exploit symmetry / anti-symmetry, or associativity to reduce the amount of work.
However, without that information (or other domain information), you can't improve on this solution.
For your real example (involving particles), you can avoid the O(N^2) comparison problem by dividing the screen into regions; e.g. using quadtrees. Then you iterate over points in the same and adjacent regions.
I've been assigned the following problem, but am having issues figuring it out. I know what I'd like to accomplish, but given the skeleton code he's outlined for us, I'm not sure where to start...I drew a pic to illustrate what I'd like to accomplish (I think...)
http://i802.photobucket.com/albums/yy304/Growler2009/Transposing-1.jpg
This is what I need to do:
Consider a directed graph G(V;A). The transpose of G written GT (V;AT ) is nothing more than
G where all the arcs have been transposed, i.e., the origin of the arc becomes the end and the end
becomes the origin. In a sense, GT is the \backward" version of G. For this question you must
implement an algorithm which, given a directed graph, produces its transpose. The API of the
algorithm is given by the following interface:
public interface Transpose<VT,AT> {
public DIGraph<VT,AT> doIt(DIGraph<VT,AT> src);
}
Implement the transpose algorithm given above. You have no restrictions on how to do this (except
that it must operate on a graph represented as an adjacency list and it cannot modify the original
graph. Report (in the comments) the space and time complexities in big O notation and brie
y
justify. (i.e., how long does it take and how much space does it uses to transpose a graph with n
vertices and m arcs).
Any help you can offer to get me started would be great.
Thanks!
in pseudolanguagecode:
create new empty set of edges E
for I in all edges:
(X,Y) = vertices of edge I ;
insert edge (Y,X) to E;
result is in E.
Space complexity: no requirements
Time complexity: O(num. of edges)
I need an idea for an efficient index/search algorithm, and/or data structure, for determining whether a time-interval overlaps zero or more time-intervals in a list, keeping in mind that a complete overlap is a special case of partial overlap . So far I've not not come up with anything fast or elegant...
Consider a collection of intervals with each interval having 2 dates - start, and end.
Intervals can be large or small, they can overlap each other partially, or not at all. In Java notation, something like this:
interface Period
{
long getStart(); // millis since the epoch
long getEnd();
boolean intersects(Period p); // trivial intersection check with another period
}
Collection<Period> c = new ArrayList<Period>(); // assume a lot of elements
The goal is to efficiently find all intervals which partially intersect a newly-arrived input interval. For c as an ArrayList this could look like...
Collection<Period> getIntersectingPeriods(Period p)
{
// how to implement this without full iteration?
Collection<Period> result = new ArrayList<Period>();
for (Period element : c)
if (element.intersects(p))
result.add(element);
return result;
}
Iterating through the entire list linearly requires too many compares to meet my performance goals. Instead of ArrayList, something better is needed to direct the search, and minimize the number of comparisons.
My best solution so far involves maintaining two sorted lists internally and conducting 4 binary searches and some list iteration for every request. Any better ideas?
Editor's Note: Time-intervals are a specific case employing linear segments along a single axis, be that X, or in this case, T (for time).
Interval trees will do:
In computer science, an interval tree is a tree data structure to hold intervals. Specifically, it allows one to efficiently find all intervals that overlap with any given interval or point. It is often used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. A similar data structure is the segment tree...
Seems the Wiki article solves more than was asked. Are you tied to Java?
You have a "huge collection of objects" which says to me "Database"
You asked about "built-in period indexing capabilities" and indexing says database to me.
Only you can decide whether this SQL meets your perception of "elegant":
Select A.Key as One_Interval,
B.Key as Other_Interval
From Big_List_Of_Intervals as A join Big_List_Of_Intervals as B
on A.Start between B.Start and B.End OR
B.Start between A.Start and A.End
If the Start and End columns are indexed, a relational database (according to advertising) will be quite efficient at this.