Directed Acyclic Graph Traversal in Java Web Application

Directed Acyclic Graph Traversal in Java Web Application - java

So I am building a web application where you can build a directed graph where a node will represent some operation and the edge will represent data flow between those operations. So for an edge {u,v} , u must run before v does. Click this link to see a sample graph
START node represents the initial value and the other nodes except the output does the operation as specified. Output node will output the value it receives as input.
Which algorithm approach should i use to process a graph like that ?

This is a perfect example of a topological sort. The most common algorithm for creating a set following the order requirements via traversing is Kahn's Algorithm. The pseudocode can be seen below and the information in the Wikipedia link should be enough to get you started.
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
Note the "starting node" will be enforced by properly representing the incoming edges in the graph. It will be the only node in S to start. Please let me know in the comments if you would like any other information.

Related

How to check if there exist a path running through all nodes in an adjacency list in Java

I recently created an unweighted bidirectional graph by using an adjacency list from a HashMap in Java. I have randomly created connections between nodes and now I am unsure of how to check if there's a single path that passes through every node once and exactly once.
What is the best way / algorithm to check if a path exists between all nodes?
//Sample
A -> B
B -> A -> C -> D
C -> B -> E
D -> B
E -> C -> G
F -> G
G -> E -> F

The sort of path you’re asking for is called a Hamiltonian path and unfortunately there are no known algorithms for this problem that run efficiently on all inputs (the problem is NP-complete). You could solve this problem by brute force (list all possible paths and see if any of them go through all the nodes once and exactly once). There’s also a famous O(n22n)-time dynamic programming algorithm for this problem as well.

You can use DFS or BFS algorithm to traverse the graph and check if each of the nodes is visited at least once.

If you only want to know whether the graph is connected or not then you can do the following:
Use BFS/DFS to map the graph starting from any of the nodes. Whenever you encounter a new node (including the first one) increase a counter by 1. This will give you the number of connected nodes to your starting node.
Compare that counter with the size of the map (number of keys). If the number of keys is greater than the number of nodes traversed then the graph is disconnected, if the number is equal then it is connected.
It's not really clear what your data structure is because you're talking about a HashMap, but showing something else. Generally, BFS/DFS run in O(|V| + |E|) (where |V| is the number of vertices and |E| is the number of edges), and size is O(1) for a HashMap.
If you want to know which nodes are not connected to your starting one, then
Store or mark all the nodes you traverse. This will give you all the nodes connected to the node from which you initiated the traversing.
Iterate over all the nodes and find those which are not contained in your previous step.
contains is also O(1) for a HashMap, though you would have to run it |V|-1 times.
There's a similar (more theory oriented) question on Computer Science: How to check whether a graph is connected in polynomial time?

Finding segments / clusters in graph efficiently

I have a graph represented using an array: [4,7,3,2,...] where each element indicates which edge index i points to. For instance, node i=0 has a directed edge to node i=4.
I want to find all segments / clusters of nodes. A segment of nodes are nodes that are connected, either directly or through other nodes. For instance, for the array [3,2,1,0], there are two segments, 0-3 and 1-2. Hence, the result should be in the format = {0:[0,3], 1:[1,2]}.
I have already implemented an algorithm using the following overall structure:
while (numberOfNodesToCheck !=0):
for:...
while true:
finding all nodes a node can reach
break if no more nodes to reach
Does anybody have a more clever solution method?

Loop in a binary tree represented in adjacency list form

I attained an interview where I was asked a question as below:
You are given with parent -----> child relationships i.e. N1 --->
N2 where N1 is the parent of N2. This is nothing but representing a
binary tree in adjacency list form. So I had to find whether there is a loop is present or not.
I have been mulling over this and came up with a solution:
Just need to check individual node i.e. N1 and try going deep if you see there is a edge coming back to N1 then print. Else go for next node. But the interviewer told me that it is not very efficient, can somebody help me in finding an efficient solution. Thanks.

You can do in a simple manner:
In a binary tree of N nodes will have N-1 edges. If it has loop then a binary tree having N nodes will have more than N-1 edges.
So you need to calculate the no of nodes and edges (parent->child) if the no_of_nodes == no_of_edges-1 than no loop else has loop.
Hope you understand.

Tree is a graph. So you should apply any algorithm for graph traversing, for instance BFS (breadth-first search) or DFS (depth-first search).
Both algorithms use O(V) of memory to store list of vertices (V - # of vertices) and take O(E) time to complete (E - # of edges, between V and V^2). So basically in both algorithms you need to explore all edges once.
The algorithm to identify loops in your tree is as simple as:
1) Take root node
2) Traverse through graph (breadth-first or depth-first) remembering visited nodes
3) If you visit a node that has already been visited, then you have a loop. Increment your loop counter, backtrack and go to 2

You need to:
check the graph is connected.
check that there's exactly E+1 vertices when you're given E edges.
You can apply Kruskal's algorithm to identify the connected components of the graph. After you've applied it, you can both check that you have exactly one connected component, and that the number of vertices is E+1.

How to generate random graphs?

I want to be able to generate random, undirected, and connected graphs in Java. In addition, I want to be able to control the maximum number of vertices in the graph. I am not sure what would be the best way to approach this problem, but here are a few I can think of:
(1) Generate a number between 0 and n and let that be the number of vertices. Then, somehow randomly link vertices together (maybe generate a random number per vertex and let that be the number of edges coming out of said vertex). Traverse the graph starting from an arbitrary vertex (say with Breadth-First-Search) and let our random graph G be all the visited nodes (this way, we make sure that G is connected).
(2) Generate a random square matrix (of 0's and 1's) with side length between 0 and n (somehow). This would be the adjacency matrix for our graph (the diagonal of the matrix should then either be all 1's or all 0's). Make a data structure from the graph and traverse the graph from any node to get a connected list of nodes and call that the graph G.
Any other way to generate a sufficiently random graph is welcomed. Note: I do not need a purely random graph, i.e., the graph you generate doesn't have to have any special mathematical properties (like uniformity of some sort). I simply need lots and lots of graphs for testing purposes of something else.
Here is the Java Node class I am using:
public class Node<T> {
T data;
ArrayList<Node> children= new ArrayList<Node>();
...}
Here is the Graph class I am using (you can tell why I am only interested in connected graphs at the moment):
public class Graph {
Node mainNode;
ArrayList<Node> V= new ArrayList<Node>();
public Graph(Node node){
mainNode= node;
}
...}
As an example, this is how I make graphs for testing purposes right now:
//The following makes a "kite" graph G (with "a" as the main node).
/* a-b
|/|
c-d
*/
Node<String> a= new Node("a");
Node<String> b= new Node("b");
Node<String> c= new Node("c");
Node<String> d= new Node("d");
a.addChild(b);
a.addChild(c);
b.addChild(a);
b.addChild(c);
b.addChild(d);
c.addChild(a);
c.addChild(b);
c.addChild(d);
d.addChild(c);
d.addChild(b);
Graph G1= new Graph(a);

Whatever you want to do with your graph, I guess its density is also an important parameter. Otherwise, you'd just generate a set of small cliques (complete graphs) using random sizes, and then connect them randomly.
If I'm correct, I'd advise you to use the Erdős-Rényi model: it's simple, not far from what you originally proposed, and allows you to control the graph density (so, basically: the number of links).
Here's a short description of this model:
Define a probability value p (the higher p and the denser the graph: 0=no link, 1=fully connected graph);
Create your n nodes (as objects, as an adjacency matrix, or anything that suits you);
Each pair of nodes is connected with a (independent) probability p. So, you have to decide of the existence of a link between them using this probability p. For example, I guess you could ranbdomly draw a value q between 0 and 1 and create the link iff q < p. Then do the same thing for each possible pair of nodes in the graph.
With this model, if your p is large enough, then it's highly probable your graph is connected (cf. the Wikipedia reference for details). In any case, if you have several components, you can also force its connectedness by creating links between nodes of distinct components. First, you have to identify each component by performing breadth-first searches (one for each component). Then, you select pairs of nodes in two distinct components, create a link between them and consider both components as merged. You repeat this process until you've got a single component remaining.

The only tricky part is ensuring that the final graph is connected. To do that, you can use a disjoint set data structure. Keep track of the number of components, initially n. Repeatedly pick pairs of random vertices u and v, adding the edge (u, v) to the graph and to the disjoint set structure, and decrementing the component count when the that structure tells you u and v belonged to different components. Stop when the component count reaches 1. (Note that using an adjacency matrix simplifies managing the case where the edge (u, v) is already present in the graph: in this case, adj[u][v] will be set to 1 a second time, which as desired has no effect.)
If you find this creates graphs that are too dense (or too sparse), then you can use another random number to add edges only k% of the time when the endpoints are already part of the same component (or when they are part of different components), for some k.

The following paper proposes an algorithm that uniformly samples connected random graphs with prescribed degree sequence, with an efficient implementation. It is available in several libraries, like Networkit or igraph.
Fast generation of random connected graphs with prescribed degrees.
Fabien Viger, Matthieu Latapy
Be careful when you make simulations on random graphs: if they are not sampled uniformly, then they may have hidden properties that impact simulations; alternatively, uniformly sampled graphs may be very different from the ones your code will meet in practice...

Graph road complexity

I need to make an algorithm that verifies if there is a road from the node x to the node y in a graph. The edges in the graph have a series of rights attached to them (like r, w, e, etc.). My algorithm need to have |e| + |v| complexity. I can only go through nodes whose edge with the node before them has a certain set of rights given as a parameter.
For example, if I have a set of rights: r, w, e, g and I distribute these rights randomly on the edges, and I give as a parameter for my search method the set of rights: e, g, I can only go through nodes whose edges has the rights e,g.
How can I do this in |e| + |v| time complexity if DFS algorithm has is I recall correctly |e| + |v| time complexity and I also need to search if the edges have the desired set of rights, which I think adds to the complexity.

You need to apply breadth-first search (unlike DFS, it will find the shortest path) modifying it slightly to take into account only nodes which have the required rights.
Here is the pseudo-code, I'm sure you can translate it to Java:
procedure BFS(G,v):
create a queue Q
create a set V
enqueue v onto Q
add v to V
while Q is not empty:
t ‹ Q.dequeue()
if t is what we are looking for:
return t
for all edges e in G.adjacentEdges(t) do
u ‹ G.adjacentVertex(t,e)
if u is not in V and t.hasRights(allowedRights):
add u to V
enqueue u onto Q
return none
It differs from the one on Wikipedia only by checking the t.hasRights(allowedRights) condition.
Using Java HashSet, checking a set of rights can be easily done in O(1) time, adding nothing to E+V complexity of the BFS algorithm (assuming number of available rights is constant).
In each node you store a set of rights, and then check if all required rights are in the set (HashSet.contains(Object) is O(1)).
Also, you can represent your rights as enum and use EnumSet to store the right sets. EnumSet is implemented as bit vectors and so is as fast as you can get with sets.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.