Strongly connected component

Strongly connected component - java

Given total number of V vertices and total number of E edges of a graph G, how can I compute largest strongly connected component with minimum number of vertices & edges in strongly connected component of graph.
Eg. for graph with 5 vertices and 7 edges, the minimum size of connected component is 3. This graph can be modeled so that there are more connected components but minimum is 3.
The problem I am facing is that edge information is not given to me. Only the total number of edges are given. I wanted to use Tarjan's algorithm using depth first search but I need edge information.
Is it possible to find the size of strongly connected component with minimum vertices and edges just total number of vertices and total number of edges.

Simplest way I can think of is to consider trying to utilise the edges to form components that are Complete Graphs of size 1, then 2, etc until you find one that works.
A complete graph of size n has (n2 - n) / 2 edges. So to split into components of size n will look something like
numFullComponents := V/n //integer division
numEdgesUsed := numFullComponents * n * (n-1) / 2 //This is edges from full components
+ (V%n) * (V%n - 1)/2 //a complete graph on the remainder nodes
You can then compare that number of edges used with the number of edges in the graph. The smallest n that uses at least E edges is the solution.

Related

How priority queue is used with heap to solve min distance

Please bear with me I am very new to data structures.
I am getting confused how a priroity queue is used to solve min distance. For example if I have a matrix and want to find the min distance from the source to the destination, I know that I would perform Dijkstra algorithm in which with a queue I can easily find the distance between source and all elements in the matrix.
However, I am confused how a heap + priority queue is used here. For example say that I start at (1,1) on a grid and want to find the min distance to (3,3) I know how to implement the algorithm in the sense of finding the neighbours and checking the distances and marking as visited. But I have read about priority queues and min heaps and want to implement that.
Right now, my only understanding is a priority queue has a key to position elements. My issue is when I insert the first neighbours (1,0),(0,0),(2,1),(1,2) they are inserted in the pq based on a key (which would be distance in this case). So then the next search would be the element in the matrix with the shortest distance. But with the pq, how can a heap be used here with more then 2 neighbours? For example the children of (1,1) are the 4 neighbours stated above. This would go against the 2*i and 2*i + 1 and i/2
In conclusion, I don't understand how a min heap + priority queue works with finding the min of something like distance.
0 1 2 3
_ _ _ _
0 - |2|1|3|2|
1 - |1|3|5|1|
2 - |5|2|1|4|
3 - |2|4|2|1|

You need to use the priority queue to get the minimum weights in every move so the MinPQ will be fit for this.
MinPQ uses internally technique of heap to put the elements in the right position operations such as sink() swim()
So the MinPQ is the data structure that uses heap technique internally

If I'm interpreting your question correctly, you're getting stuck at this point:
But with the pq, how can a heap be used here with more then 2 neighbours? For example the children of (1,1) are the 4 neighbours stated above. This would go against the 2*i and 2*i + 1 and i/2
It sounds like what's tripping you up is that there are two separate concepts here that you may be combining together. First, there's the notion of "two places in a grid might be next to one another." In that world, you have (up to) four neighbors for each location. Next, there's the shape of the binary heap, in which each node has two children whose locations are given by certain arithmetic computations on array indices. Those are completely independent of one another - the binary heap has no idea that the items its storing come from a grid, and the grid has no idea that there's an array where each node has two children stored at particular positions.
For example, suppose you want to store locations (0, 0), (2, 0), (-2, 0) and (0, 2) in a binary heap, and that the weights of those locations are 1, 2, 3, and 4, respectively. Then the shape of the binary heap might look like this:
(0, 0)
Weight 1
/ \
(2, 0) (0, 2)
Weight 2 Weight 4
/
(0, -2)
Weight 3
This tree still gives each node two children; those children just don't necessarily map back to the relative positions of nodes in the grid.
More generally, treat the priority queue as a black box. Imagine that it's just a magic device that says "you can give me some new thing to store" and "I can give you the cheapest thing you've given be so far" and that's it. The fact that, internally, it coincidentally happens to be implemented as a binary heap is essentially irrelevant.
Hope this helps!

Minimum distance between two polygons [duplicate]

I want to find the minimum distance between two polygons with million number of vertices(not the minimum distance between their vertices). I have to find the minimum of shortest distance between each vertex of first shape with all of the vertices of the other one. Something like the Hausdorff Distance, but I need the minimum instead of the maximum.

Perhaps you should check out (PDF warning! Also note that, for some reason, the order of the pages is reversed) "Optimal Algorithms for Computing the Minimum Distance Between Two Finite Planar Sets" by Toussaint and Bhattacharya:
It is shown in this paper that the
minimum distance between two finite
planar sets if [sic] n points can be
computed in O(n log n) worst-case
running time and that this is optimal
to within a constant factor.
Furthermore, when the sets form a
convex polygon this complexity can be
reduced to O(n).
If the two polygons are crossing convex ones, perhaps you should also check out (PDF warning! Again, the order of the pages is reversed) "An Optimal Algorithm for Computing the Minimum Vertex Distance Between Two Crossing Convex Polygons" by Toussaint:
Let P = {p1,
p2,..., pm} and Q = {q1, q2,...,
qn} be two intersecting polygons whose vertices are specified
by their cartesian coordinates in
order. An optimal O(m + n)
algorithm is presented for computing
the minimum euclidean distance between
a vertex pi in P and a
vertex qj in Q.

There is a simple algorithm that uses Minkowski Addition that allows calculating min-distance of two convex polygonal shapes and runs in O(n + m).
Links:
algoWiki, boost.org, neerc.ifmo.ru (in russian).
If Minkowski subtraction of two convex polygons covers (0, 0), then they intersect

Clustering of a set of 3D points

I have a 2D array of size n representing n number of points in the 3D space, position[][] for XYZ (e.g. position[0][0] is X, position[0][1] is Y, and position[0][2] is Z coordinate of point 0.
What I need to do is to do clustering on the points, so to have n/k number of clusters of size k so that each cluster consists of the k closest points in the 3D space. For instance, if n=100 and k=5, I want to have 20 clusters of 5 points which are the closest neighbors in space.
How can I achieve that? (I need pseudo-code. For snippets preferably in Java)
What I was doing so far was a simple sorting based on each component. But this does NOT give me necessarily the closest neighbors.
Sort based on X (position[0][0])
Then sort based on Y (position[0][1])
Then sort based on Z (position[0][2])
for (int i=0; i<position.length; i++){
for (int j=i+1; j<position.length; j++){
if(position[i][0] > position[i+1][0]){
swap (position[i+1][0], position[i][0]);
}
}
}
// and do this for position[i][1] (i.e. Y) and then position[i+2][2] (i.e. Z)
I believe my question slightly differs from the Nearest neighbor search with kd-trees because neighbors in each iteration should not overlap with others. I guess we might need to use it as a component, but how, that's the question.

At start you do not have a octree but list of points instead like:
float position[n][3];
So to ease up the clustering and octree creation you can use 3D point density map. It is similar to creating histogram:
compute bounding box of your points O(n)
so process all points and determine min and max coordinates.
create density map O(max(m^3,n))
So divide used space (bbox) into some 3D voxel grid (use resolution you want/need) do a density map like:
int map[m][m][m]`
And clear it with zero.
for (int x=0;x<m;x++)
for (int y=0;y<m;y++)
for (int z=0;z<m;z++)
map[x][y][z]=0;
Then process all points determine its cell position from x,y,z and increment it.
for (int i=0;i<n;i++)
{
int x=(m-1)*(position[i][0]-xmin)/(xmax-xmin);
int y=(m-1)*(position[i][1]-ymin)/(ymax-ymin);
int z=(m-1)*(position[i][2]-zmin)/(zmax-zmin);
map[x][y][z]++;
// here you can add point i into octree belonging to leaf representing this cell
}
That will give you low res density map. The higher number in cell map[x][y][z] the more points are in it which means a cluster is there and you can also move point to that cluster in your octree.
This can be recursively repeated for cells that have enough points. To make your octree create density map 2x2x2 and recursively split each cell until its count is lesser then threshold or cell size is too small.
For more info see similar QAs
Finding holes in 2d point sets? for the density map
Effective gif/image color quantization? for the clustering

What you what is not Clustering. From what you said, I think you want to divided your N points into N/k groups, with each group have k points, while keeping the points in each cluster are closest in the 3D space.
Think an easy example, if you want to do the same thing one one dimension, that is, just sort the numbers, and the first k points into cluster 1, the second k points into cluster 2, and so on.
Then return the 3D space problem, the answer is the same. Just first find the point with minimum x-axis, y-axis and z-axis, altogether with its closest k-1 points into Cluster 1. Then for the lest points, find the minimum x-axis, y-axis and z-axis points, and k-1 closest points not clustered into Cluster 2, and so on.
Above process will get your results, but that maybe not meaningful in practice, maybe cluster algorithms such as k-means could help you.

calculating the height of a Minimum spannning tree

I need a JAVA code that could help me find the height of the Minimum spanning tree.
Basically i m looking for an extension of the
Prim's/Kruskal's algo that not only gives the height of the MST but also gives its height.
Thanks in advance.

Take as the root vertex one of the tree's centers and calculate the maximal distance from the chosen center to the leaf nodes.
The height can the be calculated in the following way:
set the height to 0
while there are at least 3 remaining vertices:
delete all leaf vertices
height := height+1
if 2 vertices remain:
height := height+1
the remaining vertices are the centers of the tree.
The time complexity is O(n).
A practical way of calculating the height would be to combine all leaf nodes int a single node that serves as a root, then calculate the MST of that modification and the height as the maximum distance from generated root node to the newly generated leaf nodes.

I am not writing the code here, just giving you a hint.
In every node keep a variable that stores the height/depth of that node.
So, depth for starting node will be 0. Now, whenever you add a edge to the MST, increase the depth of the new node by 1.
Currently you have the following discovered nodes with the following depth.
a 0
b 1
c 1Now suppose you want to add the edge from c to d, so the depth of node d will be depth(c)+1, i.e 1+1=2.
Also, you can keep a track of the maximum depth among all nodes at each step of the algorithm.
So, finally answer will be the maximum depth among all the nodes of the tree.

Best algorthm to get all combination pair of nodes in an undirected graph (need to improve time complexity)

I have an undirected graph A that has : no multiple-links between any two nodes , no self-connected node, there can be some isolated nodes (nodes with degree 0).
I need to go through all the possible combinations of pair of nodes in graph A to assign some kind of score for non-existence links (Let say if my graph has k nodes and n links, then the number of combination should be (k*(k-1)/2 - n) of combinations). The way that I assign score is based on the common neighbor nodes between the 2 nodes of combination.
Ex: score between A-D should be 1, score between G-D should be 0 ...
The biggest problem is that my graph has more than 100.000 nodes and it was too slow to handle almost 10^10 combinations which is my first attempt to approach the solution.
My second thought is since the algorithm is based on common neighbors of the node, I might only need to look at the neighbors so that I can assign score which is different from 0. The rest can be determined as 0 and no need to compute. But this could possibly repeat a combination.
Any idea to approach this solution is appreciated. Please keep in mind that the actual network has more than 100.000 nodes.

If you represent your graph as an adjacency list (rather than an adjacency matrix), you can make use of the fact that your graph has only 600,000 edges to hopefully reduce the computation time.
Let's take a node V[j] with neighbors V[i] and V[k]:
V[i] ---- V[j] ---- V[k]
To find all such pairs of neighbors you can take the list of nodes adjacent to V[j] and find all combinations of those nodes. To avoid duplicates you will have to generate the combinations rather than the permutations of the end nodes V[i] and V[k] by requiring that i < k.
Alternatively, you can start with node V[i] and find all of the nodes that have a distance of 2 from V[i]. Let S be the set of all the nodes adjacent to V[i]. For each node V[j] in S, create a path V[i]-V[j]-V[k] where:
V[k] is adjacent to V[j]
V[k] is not an element of S (to avoid directly connected nodes)
k != i (to avoid cycles)
k > i (to avoid duplications)
I personally like this approach better because it completes the adjacency list for one node before moving on to the next.
Given that you have ~600,000 edges in a graph with ~100,000 nodes, assuming an even distribution of edges across all of the nodes each node would have an average degree of 12. The number of possible paths for each node is then on the order of 102. Over 105 nodes that gives on the order of 107 total paths rather than the theoretical limit of 1010 for a complete graph. Still a large number, but a thousand times faster than before.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.