Finding segments / clusters in graph efficiently

Finding segments / clusters in graph efficiently - java

I have a graph represented using an array: [4,7,3,2,...] where each element indicates which edge index i points to. For instance, node i=0 has a directed edge to node i=4.
I want to find all segments / clusters of nodes. A segment of nodes are nodes that are connected, either directly or through other nodes. For instance, for the array [3,2,1,0], there are two segments, 0-3 and 1-2. Hence, the result should be in the format = {0:[0,3], 1:[1,2]}.
I have already implemented an algorithm using the following overall structure:
while (numberOfNodesToCheck !=0):
for:...
while true:
finding all nodes a node can reach
break if no more nodes to reach
Does anybody have a more clever solution method?

Related

Finding Index of a Node in a Huffman Tree

Assume we have a tree of nodes (Huffman-tree) that hold string values in them. How would I go about traversing the tree and spitting out an index of a specific node if I had a tree like this? The numbers I drew inside the circles would be the index I want (especially 12 or 13).
Note: Due to miscommunication, I will repeat: the #'s I wrote inside the circles are not the values that the nodes hold. They are the index of that node. My problem was that I couldn't find the index, since the trees are structured weirdly - not a classic BST tree, and the values inside aren't numerical.
Edit: I redrew the image to make my question more clear.
Either way, I figured it out. I'll write up the answer after my finals.

The tree you are showing is not a binary search tree. The central property of a binary search tree that allows it to be searched efficiently is that the left descendants of a node are smaller, and the right descendants bigger than the node itself (in terms of the index value).
If you have a proper binary search tree, you can find a node with given index by comparing with nodes and following the corresponding branch, starting with the root.

How to compare all elements between two arrays?

I'm attempting to write a program which can identify all nodes in a graph that don't share any common neighbors, and in which all vertices are contained within the various subsystems in the graph. Assume all nodes are numerically labeled for simplicity.
For example, in a graph of a cube, the furthest corners share no common nodes and are part of subsystems that together contain all vertices.
I'm looking to write a program that compares each potential subsystem against all other potential subsystems, regardless of the graph, number of nodes or sides, and finds groups of subsystems whose central nodes don't share common neighbors. For simplicity's sake, assume the graphs aren't usually symmetrical, unlike the cube example, as this introduces functionally equivalent systems. However, the number of nodes in a subsystem, or elements in an array, can vary.
The goal for the program is to find a set of central nodes whose neighbors are each unique to the subsystem, that is no neighbor appears in another subsystem. This would also mean that the total number of nodes in all subsystems, central nodes and neighbors together, would equal the total number of vertices in the graph.
My original plan was to use a 2d array, where rows act as stand-ins for the subsystems. It would compare individual elements in an array against all other elements in all other arrays. If two arrays contain no similar elements, then index the compared array and its central node is recorded, otherwise it is discarded for this iteration. After the program has finished iterating through the 2d array against the first row, it adds up the number of elements from all recorded rows to see if all nodes in the graph are represented. So if a graph contains x nodes, and the number of elements in the recorded rows is less than x, then the program iterates down one row to the next subsystem and compares all values in the row against all other values like before.
Eventually, this system should print out which nodes can make up a group of subsystems that encompass all vertices and whose central nodes share no common neighbors.
My limited exposure to CS makes a task like this daunting, as it's just my way of solving puzzles presented by my professor. I'd find the systems by hand through guess-and-check methods, but with a 60+ node array...
Thanks for any help, and simply pointers in the right direction would be very much appreciated.

I don't have a good solution (and maybe there exists none; it sounds very close to vertex cover). So you may need to resort backtracking. The idea is the following:
Keep a list of uncovered vertices and a list of potential central node candidates. Both lists initially contain all vertices. Then start placing a random central node from the candidate list. This will erase the node and its one-ring from the uncovered list and the node plus its one-ring and two-ring from the candidate list. Do this until the uncovered list is empty or you run out of candidates. If you make a mistake, revert the last step (and possibly more).
In pseudo-code, this looks as follows:
findSolution(uncoveredVertices : list, centralNodeCandidates : list, centralNodes : list)
if uncoveredVertices is empty
return centralNodes //we have found a valid partitioning
if centralNodeCandidates is empty
return [failure] //we cannot place more central nodes
for every n in centralNodeCandidates
newUncoveredVertices <- uncoveredVertices \ { n } \ one-ring of n
newCentralNodeCandidates <- centralNodeCandidates \ { n } \ one-ring of n \ two-ring of n
newCentralNodes = centralNodes u { n }
subProblemSolution = findSolution(newUncoveredVertices, newCentralNodeCandidates, newCentralNodes)
if subProblemSolution is not [failure]
return subProblemSolution
next
return [failure] //none of the possible routes to go yielded a valid solution
Here, \ is the set minus operator and u is set union.
There are several possible optimizations:
If you know the maximum number of nodes, you can represent the lists as bitmaps (for a maximum of 64 nodes, this even fits into a 64 bit integer).
You may end up checking the same state multiple times. To avoid this, you may want to cache the states that resulted in failures. This is in the spirit of dynamic programming.

Directed Acyclic Graph Traversal in Java Web Application

So I am building a web application where you can build a directed graph where a node will represent some operation and the edge will represent data flow between those operations. So for an edge {u,v} , u must run before v does. Click this link to see a sample graph
START node represents the initial value and the other nodes except the output does the operation as specified. Output node will output the value it receives as input.
Which algorithm approach should i use to process a graph like that ?

This is a perfect example of a topological sort. The most common algorithm for creating a set following the order requirements via traversing is Kahn's Algorithm. The pseudocode can be seen below and the information in the Wikipedia link should be enough to get you started.
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
Note the "starting node" will be enforced by properly representing the incoming edges in the graph. It will be the only node in S to start. Please let me know in the comments if you would like any other information.

Traverse a tree represented by its edges

My tree is represented by its edges and the root node. The edge list is undirected.
char[][] edges =new char[][]{
new char[]{'D','B'},
new char[]{'A','C'},
new char[]{'B','A'}
};
char root='A';
The tree is
A
B C
D
How do I do depth first traversal on this tree? What is the time complexity?
I know time complexity of depth first traversal on linked nodes is O(n). But if the tree is represented by edges, I feel the time complexity is O(n^2). Am I wrong?
Giving code is appreciated, although I know it looks like homework assignment..

The general template behind DFS looks something like this:
function DFS(node) {
if (!node.visited) {
node.visited = true;
for (each edge {node, v}) {
DFS(v);
}
}
}
If you have your edges represented as a list of all the edges in the graph, then you could implement the for loop by iterating across all the edges in the graph and, every time you find one with the current node as its source, following the edge to its endpoint and running a DFS from there. If you do this, then you'll do O(m) work per node in the graph (here, m is the number of edges), so the runtime will be O(mn), since you'll do this at most once per node in the graph. In a tree, the number of edges is always O(n), so for a tree the runtime is O(n2).
That said, if you have a tree and there are only n edges, you can speed this up in a bunch of ways. First, you could consider doing an O(n log n) preprocessing step to sort the array of edges. Then, you can find all the edges leaving a given node by doing a binary search to find the first edge leaving the node, then iterating across the edges starting there to find just the edges leaving the node. This improves the runtime quite a bit: you do O(log n) work per node for the binary search, and then every edge gets visited only once. This means that the runtime is O(n log n). Since you've mentioned that the edges are undirected, you'll actually need to create two different copies of the edges array - one that's the original one, and one with the edges reversed - and should sort each one independently. The fact that DFS marks visited nodes along the way means that you don't need to do any extra bookkeeping here to figure out which direction you should go at each step, and this doesn't change the overall time complexity, though it does increase the space usage.
Alternatively, you could use a hashing-based solution. Before doing the DFS, iterate across the edges and convert them into a hash table whose keys are the nodes and whose values are lists of the edges leaving that node. This will take expected time O(n). You can then implement the "for each edge" step quite efficiently by just doing a hash table lookup to find the edges in question. This reduces the time to (expected) O(n), though the space usage goes up to O(n) as well. Since your edges are undirected, as you populate the table, just be sure to insert the edge in each direction.

Loop in a binary tree represented in adjacency list form

I attained an interview where I was asked a question as below:
You are given with parent -----> child relationships i.e. N1 --->
N2 where N1 is the parent of N2. This is nothing but representing a
binary tree in adjacency list form. So I had to find whether there is a loop is present or not.
I have been mulling over this and came up with a solution:
Just need to check individual node i.e. N1 and try going deep if you see there is a edge coming back to N1 then print. Else go for next node. But the interviewer told me that it is not very efficient, can somebody help me in finding an efficient solution. Thanks.

You can do in a simple manner:
In a binary tree of N nodes will have N-1 edges. If it has loop then a binary tree having N nodes will have more than N-1 edges.
So you need to calculate the no of nodes and edges (parent->child) if the no_of_nodes == no_of_edges-1 than no loop else has loop.
Hope you understand.

Tree is a graph. So you should apply any algorithm for graph traversing, for instance BFS (breadth-first search) or DFS (depth-first search).
Both algorithms use O(V) of memory to store list of vertices (V - # of vertices) and take O(E) time to complete (E - # of edges, between V and V^2). So basically in both algorithms you need to explore all edges once.
The algorithm to identify loops in your tree is as simple as:
1) Take root node
2) Traverse through graph (breadth-first or depth-first) remembering visited nodes
3) If you visit a node that has already been visited, then you have a loop. Increment your loop counter, backtrack and go to 2

You need to:
check the graph is connected.
check that there's exactly E+1 vertices when you're given E edges.
You can apply Kruskal's algorithm to identify the connected components of the graph. After you've applied it, you can both check that you have exactly one connected component, and that the number of vertices is E+1.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.