Please bear with me I am very new to data structures.
I am getting confused how a priroity queue is used to solve min distance. For example if I have a matrix and want to find the min distance from the source to the destination, I know that I would perform Dijkstra algorithm in which with a queue I can easily find the distance between source and all elements in the matrix.
However, I am confused how a heap + priority queue is used here. For example say that I start at (1,1) on a grid and want to find the min distance to (3,3) I know how to implement the algorithm in the sense of finding the neighbours and checking the distances and marking as visited. But I have read about priority queues and min heaps and want to implement that.
Right now, my only understanding is a priority queue has a key to position elements. My issue is when I insert the first neighbours (1,0),(0,0),(2,1),(1,2) they are inserted in the pq based on a key (which would be distance in this case). So then the next search would be the element in the matrix with the shortest distance. But with the pq, how can a heap be used here with more then 2 neighbours? For example the children of (1,1) are the 4 neighbours stated above. This would go against the 2*i and 2*i + 1 and i/2
In conclusion, I don't understand how a min heap + priority queue works with finding the min of something like distance.
0 1 2 3
_ _ _ _
0 - |2|1|3|2|
1 - |1|3|5|1|
2 - |5|2|1|4|
3 - |2|4|2|1|
You need to use the priority queue to get the minimum weights in every move so the MinPQ will be fit for this.
MinPQ uses internally technique of heap to put the elements in the right position operations such as sink() swim()
So the MinPQ is the data structure that uses heap technique internally
If I'm interpreting your question correctly, you're getting stuck at this point:
But with the pq, how can a heap be used here with more then 2 neighbours? For example the children of (1,1) are the 4 neighbours stated above. This would go against the 2*i and 2*i + 1 and i/2
It sounds like what's tripping you up is that there are two separate concepts here that you may be combining together. First, there's the notion of "two places in a grid might be next to one another." In that world, you have (up to) four neighbors for each location. Next, there's the shape of the binary heap, in which each node has two children whose locations are given by certain arithmetic computations on array indices. Those are completely independent of one another - the binary heap has no idea that the items its storing come from a grid, and the grid has no idea that there's an array where each node has two children stored at particular positions.
For example, suppose you want to store locations (0, 0), (2, 0), (-2, 0) and (0, 2) in a binary heap, and that the weights of those locations are 1, 2, 3, and 4, respectively. Then the shape of the binary heap might look like this:
(0, 0)
Weight 1
/ \
(2, 0) (0, 2)
Weight 2 Weight 4
/
(0, -2)
Weight 3
This tree still gives each node two children; those children just don't necessarily map back to the relative positions of nodes in the grid.
More generally, treat the priority queue as a black box. Imagine that it's just a magic device that says "you can give me some new thing to store" and "I can give you the cheapest thing you've given be so far" and that's it. The fact that, internally, it coincidentally happens to be implemented as a binary heap is essentially irrelevant.
Hope this helps!
Related
I want to generate a list of random numbers of size 500, where the list is exactly 30% sorted (I know how to generate a list of at least 30% sorted), but that's not what i want, how do i generate a file that is "exactly" 30%? I'm stuck, How can this be done?
Here is the exact wording
"For the sorts, you should construct three different files of each size: ordered, keys in reverse order, and finally one in which 30% of the keys are ordered. The latter file should not consist of files in which your sort is 30% complete, but rather in files in which 30% of the keys are correctly placed with respect to one another but are not necessarily contiguous.
There are 2 main ideas I can see for percentage sorted:
Simply the number of elements out of place.
Once should be able to get an estimated % sorted by sorting it, then iterating through it, and, keeping each element the same with the desired percentage as probability, otherwise swapping it with a random remaining element (so, if we want 30% sorted, we'll keep an element the same with 30% probability, and swap it with 70%).
If an exact number is needed, one could use the above result and (intelligently) swap random elements until the desired percentage is obtained.
The number of inversions.
An inversion is a pair of places of a sequence where the elements on these places are out of their natural order.
One idea is to first sort it, then to swap random elements that get us closer to the desired percentage sorted, until we get there.
Only swapping elements that get us closer to the desired result is difficult (at least doing so efficiently).
A very brute force approach would be to count the change in the number of inversions that each pair of swaps would cause, and then pick a random one that gets us closer to our target.
Another idea is to just generate random pairs and count the number of inversions until we find one that gets us closer.
A third option is to pick a random element. If it's larger than half the elements, try to move it left (ideally increasing the number of inversions). If it's smaller, try to move it right. In trying to move it left/right, we can look for a smaller / larger element (respectively) to swap it with and count the change in inversions (we only need to consider the elements between the swapped elements when counting the change in inversions).
At first we could probably just randomly swap elements as we're likely to tend to more inversions.
If the percentage is above 50%, we could also start with a reversed array, i.e. 100% unsorted.
There's a one-to-one correspondence that maps permutations to {0} x {0, 1} x {0, 1, 2} x ... x {0, 1, ... n - 1}, where the jth element of the tuple in the codomain is the number of inversions involving elements at positions j and i < j. In this light, the problem is sampling a random element of the codomain that sums to the desired number of inversions.
Here's an instance of Gibbs sampling for this problem. Initialize a tuple summing to the desired number of permutations. Repeatedly select two distinct indices and randomize uniformly among all possibilities with the same sum. Stop when you're tired of waiting (the distribution converges on uniform but never gets there; maybe tomorrow I will figure out a Propp--Wilson style technique for exact samples).
In Python (untested):
import random
def gibbs(n, target):
perm = [0] * n
for i in range(n):
perm[i] = min(target, i)
target -= i
assert target == 0
while ???:
i = random.randrange(n)
j = random.randrange(n)
if i == j: continue
total = perm[i] + perm[j]
perm[i] = random.randrange(max(total - j, 0), i + 1)
perm[j] = total - perm[i]
for j in range(n):
perm[j] = j - perm[j]
for i in range(j):
if perm[i] >= perm[j]: perm[i] += 1
return perm
One could also get exact samples by dynamic programming and conditional probability, but the running time for 500 looks slightly prohibitive from here.
I have an undirected graph A that has : no multiple-links between any two nodes , no self-connected node, there can be some isolated nodes (nodes with degree 0).
I need to go through all the possible combinations of pair of nodes in graph A to assign some kind of score for non-existence links (Let say if my graph has k nodes and n links, then the number of combination should be (k*(k-1)/2 - n) of combinations). The way that I assign score is based on the common neighbor nodes between the 2 nodes of combination.
Ex: score between A-D should be 1, score between G-D should be 0 ...
The biggest problem is that my graph has more than 100.000 nodes and it was too slow to handle almost 10^10 combinations which is my first attempt to approach the solution.
My second thought is since the algorithm is based on common neighbors of the node, I might only need to look at the neighbors so that I can assign score which is different from 0. The rest can be determined as 0 and no need to compute. But this could possibly repeat a combination.
Any idea to approach this solution is appreciated. Please keep in mind that the actual network has more than 100.000 nodes.
If you represent your graph as an adjacency list (rather than an adjacency matrix), you can make use of the fact that your graph has only 600,000 edges to hopefully reduce the computation time.
Let's take a node V[j] with neighbors V[i] and V[k]:
V[i] ---- V[j] ---- V[k]
To find all such pairs of neighbors you can take the list of nodes adjacent to V[j] and find all combinations of those nodes. To avoid duplicates you will have to generate the combinations rather than the permutations of the end nodes V[i] and V[k] by requiring that i < k.
Alternatively, you can start with node V[i] and find all of the nodes that have a distance of 2 from V[i]. Let S be the set of all the nodes adjacent to V[i]. For each node V[j] in S, create a path V[i]-V[j]-V[k] where:
V[k] is adjacent to V[j]
V[k] is not an element of S (to avoid directly connected nodes)
k != i (to avoid cycles)
k > i (to avoid duplications)
I personally like this approach better because it completes the adjacency list for one node before moving on to the next.
Given that you have ~600,000 edges in a graph with ~100,000 nodes, assuming an even distribution of edges across all of the nodes each node would have an average degree of 12. The number of possible paths for each node is then on the order of 102. Over 105 nodes that gives on the order of 107 total paths rather than the theoretical limit of 1010 for a complete graph. Still a large number, but a thousand times faster than before.
i have to represent a graph in java but neither as an adjacency list nor an adjacency matrix..
the basic idea is that if
deg[i]
is that exit degree of vertex i, then its neigboors can store in
edges[i][j] where
i <= j <= deg[i]
, but given that
edges[][]
must be initialized with some values i dont know how to make it differ from an adjacency matrix..
any suggestions?
In my knowledge there are only two ways to represent a graph in a language.
Either Use Adjacency matrix
Or Use Incidence matrix
You can make an incidence matrix like
E1 E2 E3 E4
V1 1 2 1
1 V2 2
1 2 1 V3
1 1 1 2
V4 1 1 2 1
You are fighting against lower bounds on this question. The two main representations of the graph are already very good for their respective use.
Adjacency list, minimizes space. You will be hard pressed to use less memory than 1 pointer per edge. Space: O(V*E). Search: O(V)
Adjacency matrix, is very fast, at the cost of v^2 space. Space: O(V^2). Search: O(1)
So, to make something that is better for both space and time you will have to combine the ideas from both. Also realize will just have better practical performance, theoretically you will not improve O(1) search, or O(V*E) size.
My idea would be to store all the graph nodes in one array. Then for each Node have an adjacency list represented as a bit vector. This would essentially be a matrix like representation, but only for those nodes that exist in the graph, giving you a smaller size than a matrix. Search would be slightly improved over an Adjacency list as the query node can be tested against the bit vector.
Also check out sparse matrix representations.
I have built a 8 puzzle solver using Breadth First Search. I would now want to modify the code to use heuristics. I would be grateful if someone could answer the following two questions:
Solvability
How do we decide whether an 8 puzzle is solvable ? (given a starting state and a goal state )
This is what Wikipedia says:
The invariant is the parity of the permutation of all 16 squares plus
the parity of the taxicab distance (number of rows plus number of
columns) of the empty square from the lower right corner.
Unfortunately, I couldn't understand what that meant. It was a bit complicated to understand. Can someone explain it in a simpler language?
Shortest Solution
Given a heuristic, is it guaranteed to give the shortest solution using the A* algorithm? To be more specific, will the first node in the open list always have a depth ( or the number of movements made so fat ) which is the minimum of the depths of all the nodes present in the open list?
Should the heuristic satisfy some condition for the above statement to be true?
Edit : How is it that an admissible heuristic will always provide the optimal solution? And how do we test whether a heuristic is admissible?
I would be using the heuristics listed here
Manhattan Distance
Linear Conflict
Pattern Database
Misplaced Tiles
Nilsson's Sequence Score
N-MaxSwap X-Y
Tiles out of row and column
For clarification from Eyal Schneider :
I'll refer only to the solvability issue. Some background in permutations is needed.
A permutation is a reordering of an ordered set. For example, 2134 is a reordering of the list 1234, where 1 and 2 swap places. A permutation has a parity property; it refers to the parity of the number of inversions. For example, in the following permutation you can see that exactly 3 inversions exist (23,24,34):
1234
1432
That means that the permutation has an odd parity. The following permutation has an even parity (12, 34):
1234
2143
Naturally, the identity permutation (which keeps the items order) has an even parity.
Any state in the 15 puzzle (or 8 puzzle) can be regarded as a permutation of the final state, if we look at it as a concatenation of the rows, starting from the first row. Note that every legal move changes the parity of the permutation (because we swap two elements, and the number of inversions involving items in between them must be even). Therefore, if you know that the empty square has to travel an even number of steps to reach its final state, then the permutation must also be even. Otherwise, you'll end with an odd permutation of the final state, which is necessarily different from it. Same with odd number of steps for the empty square.
According to the Wikipedia link you provided, the criteria above is sufficient and necessary for a given puzzle to be solvable.
The A* algorithm is guaranteed to find the (one if there are more than one equal short ones) shortest solution, if your heuristic always underestimates the real costs (In your case the real number of needed moves to the solution).
But on the fly I cannot come up with a good heuristic for your problem. That needs some thinking to find such a heuristic.
The real art using A* is to find a heuristic that always underestimates the real costs but as little as possible to speed up the search.
First ideas for such a heuristic:
A quite pad but valid heuristic that popped up in my mind is the manhatten distance of the empty filed to its final destination.
The sum of the manhatten distance of each field to its final destination divided by the maximal number of fields that can change position within one move. (I think this is quite a good heuristic)
For anyone coming along, I will attempt to explain how the OP got the value pairs as well as how he determines the highlighted ones i.e. inversions as it took me several hours to figure it out. First the pairs.
First take the goal state and imagine it as a 1D array(A for example)
[1,2,3,8,0,4,7,5]. Each value in that array has it's own column in the table(going all the way down, which is the first value of the pair.)
Then move over 1 value to the right in the array(i + 1) and go all the way down again, second pair value. for example(State A): the first column, second value will start [2,3,8,0,4,7,5] going down. the second column, will start [3,8,0,4,7,5] etc..
okay now for the inversions. for each of the 2 pair values, find their INDEX location in the start state. if the left INDEX > right INDEX then it's an inversion(highlighted). first four pairs of state A are: (1,2),(1,3),(1,8),(1,0)
1 is at Index 3
2 is at Index 0
3 > 0 so inversion.
1 is 3
3 is 2
3 > 2 so inversion
1 is 3
8 is 1
3 > 2 so inversion
1 is 3
0 is 7
3 < 7 so No inversion
Do this for each pairs and tally up the total inversions.
If both even or both odd (Manhattan distance of blank spot And total inversions)
then it's solvable. Hope this helps!
Could some one guide me on how to solve this problem.
We are given a set S with k number of elements in it.
Now we have to divide the set S into x subsets such that the difference in number of elements in each subset is not more than 1 and the sum of each subset should be as close to each other as possible.
Example 1:
{10, 20, 90, 200, 100} has to be divided into 2 subsets
Solution:{10,200}{20,90,100}
sum is 210 and 210
Example 2:
{1, 1, 2, 1, 1, 1, 1, 1, 1, 6}
Solution:{1,1,1,1,6}{1,2,1,1,1}
Sum is 10 and 6.
I see a possible solution in two stages.
Stage One
Start by selecting the number of subsets, N.
Sort the original set, S, if possible.
Distribute the largest N numbers from S into subsets 1 to N in order.
Distribute the next N largest numbers from S the subsets in reverse order, N to 1.
Repeat until all numbers are distributed.
If you can't sort S, then distribute each number from S into the subset (or one of the subsets) with the least entries and the smallest total.
You should now have N subsets all sized within 1 of each other and with very roughly similar totals.
Stage Two
Now try to refine the approximate solution you have.
Pick the largest total subset, L, and the smallest total subset, M. Pick a number in L that is smaller than a number in M but not by so much as to increase the absolute difference between the two subsets. Swap the two numbers. Repeat. Not all pairs of subsets will have swappable numbers. Each swap keeps the subsets the same size.
If you have a lot of time you can do a thorough search; if not then just try to pick off a few obvious cases. I would say don't swap numbers if there is no decrease in difference; otherwise you might get an infinite loop.
You could interleave the stages once there are at least two members in some subsets.
There is no easy algorithm for this problem.
Check out the partition problem also known as the easiest hard problem , that solve this for 2 sets. This problem is NP-Complete, and you should be able to find all the algorithms to solve it on the web
I know your problem is a bit different since you can chose the number of sets, but you can inspire yourself from solutions to the previous one.
For example :
You can transform this into a serie of linear programs, let k be the number of element in your set.
{a1 ... ak} is your set
For i = 2 to k:
try to solve the following program:
xjl = 1 if element j of set is in set number l (l <= i) and 0 otherwise
minimise max(Abs(sum(apxpn) -sum(apxpm)) for all m,n) // you minimise the max of the difference between 2 sets.
s.t
sum(xpn) on n = 1
(sum(xkn) on k)-(sum(xkm) on k) <= 1 for all m n // the number of element in 2 list are different at most of one element.
xpn in {0,1}
if you find a min less than one then stop
otherwise continue
end for
Hope my notations are clear.
The complexity of this program is exponential, and if you find a polynomial way to solve this you would probe P=NP so I don't think you can do better.
EDIT
I saw you comment ,I missunderstood the constraint on the size of the subsets (I thought it was the difference between 2 sets)
I don't I have time to update it I will do it when I have time.
sryy
EDIT 2
I edited the linear program and it should do what it's asked to do. I just added a constraint.
Hope this time the problem is fully understood, even though this solution might not be optimal
I'm no scientist, so I'd try two approaches:
After sorting items, then going from both "ends" and moving first and last to the actual set,then shift to next set, loop;
Then:
Checking the differences of sums of the sets, and shuffling items if it would help.
Coding the resulting sets appropriately and trying genetic algorithms.