How to implement multithreading on DBSCAN clustering algorithm?

How to implement multithreading on DBSCAN clustering algorithm? - java

I have implemented dbscan algorithm to cluster 3d point cloud data. It works very well indeed but the only problem is that it takes too long processing time. almost 15secs for 6000 point cloud. Wanna implement multithreading to reduce the processing time. Would highly appreciate it if one could help with implementing multithreading on the following complete piece of code. Thanks!
public ArrayList<List<Vector>> Run() {
int index = 0; //index for each point cloud (cloud -->input data)
List <Vector> neighbors;
ArrayList<List<Vector>> resultList = new ArrayList<List<Vector>>(); //group of cluster --> ArrayList<list<Vector>>
while (cloud.size() > index) {
Vector p = cloud.get(index);
if (!visited.contains(p)) {
visited.add(p);
neighbors = get_neighbors(p);
if (neighbors.size() >= minPts) { //minpts = 5
int ind = 0;
while (neighbors.size() > ind) {
Vector r = neighbors.get(ind);
if (!visited.contains(r)) {
visited.add(r);
List<Vector> individualNeighbors = get_neighbors(r);
if (individualNeighbors.size() >= minPts) {
neighbors = merge_neighbors(
neighbors,
individualNeighbors);
}
}
ind++;
}
resultList.add(neighbors);
}
}
index++;
}
return resultList;
}
private List<Vector> merge_neighbors(List<Vector>neighborPts1, List<Vector>neighborPts2) {
for (Vector n2: neighborPts2) {
if (!neighborPts1.contains(n2)) {
neighborPts1.add(n2);
}
}
return neighborPts1;
}
private List<Vector> get_neighbors(Vector pt){
CopyOnWriteArrayList<Vector> pts = new CopyOnWriteArrayList<>();
for (Vector p: cloud) {
if (computeDistance (pt,p)<=eps*eps) {
pts.add(p);
}
}
return pts;
}
private double computeDistance (Vector core,Vector target) {
return Math.pow(core.getX()-target.getX(),2)
+ Math.pow(core.getY()-target.getY(),2)
+Math.pow(core.getZ()-target.getZ(),2);
}
}

A) there is lot of optimization potential in your implementation that is easier to do than multithreading. So first optimize your code.
In particular, if you load your data into tools such as ELKI (make sure to add a spatial index, which is not default), you'll notice that they run much faster even with just a single thread
B) there are publications on multicore DBSCAN that discuss the difficulties and challenges when pallelizing DBSCAN. Read then first, as the whole story is to long for this Q&A format here:
Patwary, M. A., Palsetia, D., Agrawal, A., Liao, W. K., Manne, F., & Choudhary, A. (2012, November). A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (p. 62). IEEE Computer Society Press.
Götz, M., Bodenstein, C., & Riedel, M. (2015, November). HPDBSCAN: highly parallel DBSCAN. In Proceedings of the workshop on machine learning in high-performance computing environments (p. 2). ACM.
Welton, B., Samanas, E., & Miller, B. P. (2013, November). Mr. scan: Extreme scale density-based clustering using a tree-based network of gpgpu nodes. In SC'13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (pp. 1-11). IEEE.

Related

Indexing error in BFS algorithm of undirected graph

class Solution {
// Function to return Breadth First Traversal of given graph.
public ArrayList<Integer> bfsOfGraph(int V, ArrayList<ArrayList<Integer>> adj)
{
ArrayList<Integer> result = new ArrayList<>();
Queue<Integer> q = new LinkedList<>();
q.add(0);
boolean[] visited = new boolean[V];
visited[0] = true;
while(!q.isEmpty()) {
int v = q.poll();
result.add(v);
ArrayList<Integer> adjList = adj.get(v);
for(int i : adjList) {
if(!visited[i]) {
visited[i] = true;
q.add(i);
}
}
}
return result;
}
}
Error:
[1
I am attempting bfs algorithm in undirected graph and it is showing
error of segmentation fault if anyone have any knowledge regarding the
concept please reply.

Segmentation Faults happen in programs (such as the JVM) due to memory errors. Either the JVM has a bug in it that makes it try to use the wrong section of memory on the computer when its cranked up to use that much buffer space, or it tries to allocate 256M of memory and in the process, it uses more space than the computer gave it.
Make sure you update your JVM. Please let me know if you still face the issue.

Best way to remove uninteresting/losing lines in chess, in time-based solution?

I'm creating a chess engine as a practice in Java, I know it's not recommended due to speed issues but I'm doing it just for practice.
After implementing minimax with alpha-beta pruning, I thought of implementing a time-limit to find the score of a given move.
Here is the code
private int minimax(MoveNode node, MoveNodeType nodeType, int alpha, int beta, Side side, int depth) throws Exception {
// isInterestingLine(prevscores, node, side);
if (depth <= 0) {
count++;
return node.evaluateBoard(side);
}
// Generate Child nodes if we haven't.
if (node.childNodes == null || node.childNodes.size() == 0) {
node.createSingleChild();
}
if (nodeType == MoveNodeType.MAX) {
int bestValue = -1000;
for (int i = 0; i < node.childNodes.size(); i++) {
if (node.childNodes.get(i) == null) continue;
int value = minimax(node.childNodes.get(i), MoveNodeType.MIN, alpha, beta, side, depth - 1);
bestValue = Math.max(bestValue, value);
alpha = Math.max(alpha, bestValue);
if (beta <= alpha) {
break;
}
node.createSingleChild();
}
// reCalculateScore();
return bestValue;
} else {
int bestValue = 1000;
for (int i = 0; i < node.childNodes.size(); i++) {
if (node.childNodes.get(i) == null) continue;
int value = minimax(node.childNodes.get(i), MoveNodeType.MAX, alpha, beta, side, depth - 1);
bestValue = Math.min(bestValue, value);
beta = Math.min(beta, bestValue);
if (beta <= alpha) {
break;
}
node.createSingleChild();
}
// reCalculateScore();
return bestValue;
}
}
and the driver code.
void evaluateMove(Move mv, Board brd) throws Exception {
System.out.println("Started Comparing! " + this.tree.getRootNode().getMove().toString());
minmaxThread = new Thread(new Runnable() {
#Override
public void run() {
try {
bestMoveScore = minimax(tree.getRootNode(), MoveNodeType.MIN, -1000, 1000, side, MAX_DEPTH);
} catch (Exception e) {
e.printStackTrace();
}
}
});
minmaxThread.start();
}
This is how I implemented time-limit.
long time = System.currentTimeMillis();
moveEvaluator.evaluateMove(move, board.clone());
while((System.currentTimeMillis() - time) < secToCalculate*1000 && !moveEvaluator.minmaxThread.isAlive()) {
}
System.out.println("Time completed! score = " + moveEvaluator.bestMoveScore + " move = " + move + " depth = " + moveEvaluator.searchDepth) ;
callback.callback(move, moveEvaluator.bestMoveScore);
Now, Here is the problem
You see, it only calculated Bb7, because of the depth-first search time runs out before even calculating another line.
So I want a way to calculate like following in a time-limit based solution.
Here are a few solutions I taught of.
Implementing an isInteresting() function. which takes all the previous scores and checks if the current line is interesting/winning if yes then and only then calculates next child nodes.
e.g.
[0,0,0,0,0,0] can be interpreted as a drawn line.
[-2,-3,-5,-2,-1] can be interpreted as a losing line.
Searching for small depth-first and then elimination all losing lines.
for (int i = min_depth; i <= max_depth; i ++) {
scores = [];
for(Node childnode : NodesToCalculate) {
scores.push(minimax(childnode, type, alpha, beta, side, i));
}
// decide which child node to calculate for next iterations.
}
But, none of the solutions is perfect and efficient, In the first one, we are just making a guess and In second on we are calculating one node more than once.
Is there a better way to do this?

The solution to this problem used by every chess engine is iterative deepening.
Instead of searching to a fixed depth (MAX_DEPTH in your example) you start by searching to a depth of one, then when this search is done you start again with a depth of two and you continue to increase depth like this until you are out of time. When you are out of time you can play the move of the last search that was completed.
It may seem like lots of time will be spent on lower depth iteration that are later replaced by deeper search and that the time sent doing so is completely lost, but in practice it's not true. Since searching to a depth N is so much longer than searching at depth N-1 the time spent on the lower depth search is always much less than the time spent on the last (deeper) search.
If your engine use a transposition table, the data in the transposition table from previous iteration will help the later iterations. The alpha-beta algorithm's performance is really sensitive to the order move are searched. The time saved by alpha-beta over minimax is optimal when the best move is searched first. If you did a search for depth N-1 before the search for depth N, the transposition table will probably contains a good guess of the best move for most positions that can then be searched first.
In practice, in a engine using a transposition table and ordering move at the root based on previous iteration it's faster to use iterative deepening than not using it. I mean for exemple it's faster to do a depth 1 search, then en depth 2 search, then a depth3 search until say a depth 10 search than it is doing a depth 10 search right away. Plus you get the option to stop the search whenever you want and still have a move to play,=.

Genetic Algorithm for Process Allocation

I have the following uni assignment that's been puzzling me. I have to implement a genetic algorithm that allocates processes into processors. More specifically the problem is the following:
"You have a program that is computed in parallel processor system. The program is made up of a N number of processes that need to be allocated on a n number of processors (where n is way smaller than N). The communication of processes during this whole process can be quite time consuming, so the best practice would be to assign processes that need intercommunication with one another to same processor.
In order to reduce the communication time between processes you could allocate of these processes to the same processor but this would negate the parallel processing idea that every processor needs to contribute to the whole process.
Consider the following: Let's say that Cij is the total amount of communication between process i and process j. Assume that every process needs the same amount of computing power so that the limitations of the processing process can be handled by assigning the same amount of processes to a processor. Use a genetic algorithm to assign N processes to n processors."
The above is roughly translated the description of the problem. Now I have the following question that puzzle me.
1) What would be the best viable solution in order to for the genetic algorithm to run. I have the theory behind them and I have deduced that you need a best possible solution in order to check each generation of the produced population.
2) How can I properly design the whole problem in order to be handled by a program.
I am planning to implement this in Java but any other recommendations for other programming languages would be welcome.

The Dude abides. Or El Duderino if you're not into the whole brevity thing.
What you're asking is really a two part question, but the Genetic Algorithm part can be easily illustrated in concept. I find that giving a basic start can be helpful, but this problem as a "whole" is too complicated to address here.
Genetic Algorithms (GA) can be used as an optimizer, as you note. In order to apply a GA to a process execution plan, you need to be able to score an execution plan, then clone and mutate the best plans. A GA works by running several plans, cloning the best, and then mutating some of them slightly to see if the offspring (cloned) plans are improved or worsened.
In this example, I created a array of 100 random Integers. Each Integer is a "process" to be run and the value of the Integer is the "cost" of running that individual process.
List<Integer> processes = new ArrayList<Integer>();
The processes are then added to an ExecutionPlan, which is a List<List<Integer>>. This List of List of Integers will be used to represent 4 processors doing 25 rounds of processing:
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
The total cost of an execution plan will be computed by taking the highest process cost per round (the greatest Integer) and summing the costs of all the rounds. Thus, the goal of the optimizer is to distribute the initial 100 integers (processes) into 25 rounds of "processing" on 4 "processors" such that total cost is as low as possible.
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
// i.e., kill off the least fit and reproduce the best fit.
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
When the program is run, the cost is initially randomly determined, but with each generation it improves. If you run it for 1000 generations and plot the results, a typical run will look like this:
The purpose of the GA is to Optimize or attempt to determine the best possible solution. The reason it can be applied to you problem is that your ExecutionPlan can be scored, cloned and mutated. The path to success, therefore, is to separate the problems in your mind. First, figure out how you can make an execution plan that can be scored as to what the cost will be to actually run it on an assumed set of hardware. Add rountines to clone and mutate an ExecutionPlan. Once you have that plug it into this GA example. Good luck and stay cool dude.
public class Optimize {
private static int GENERATIONCOUNT = 1000;
private static int PROCESSCOUNT = 100;
private static int MUTATIONCOUNT = PROCESSCOUNT/10;
public static void main(String...strings) {
new Optimize().run();
}
// define an execution plan as 25 runs on 4 processors
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
public ExecutionPlan(List<List<Integer>> plan) {
this.plan = plan;
}
#Override
public int compareTo(ExecutionPlan o) {
return cost-o.cost;
}
#Override
public String toString() {
return Integer.toString(cost);
}
}
private void run() {
// make 100 processes to be completed
List<Integer> processes = new ArrayList<Integer>();
// assign them a random cost between 1 and 100;
for ( int index = 0; index < PROCESSCOUNT; ++index) {
processes.add( new Integer((int)(Math.random() * 99.0)+1));
}
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
}
private void mutateClones(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
// mutate 10 different location swaps, maybe the plan improves
for ( int mutationCount = 0; mutationCount < MUTATIONCOUNT ; ++mutationCount) {
int location1 = (int)(Math.random() * 100.0);
int location2 = (int)(Math.random() * 100.0);
// swap two locations
Integer processCostTemp = execution.plan.get(location1/4).get(location1%4);
execution.plan.get(location1/4).set(location1%4, execution.plan.get(location2/4).get(location2%4));
execution.plan.get(location2/4).set(location2%4, processCostTemp);
}
}
}
private void cloneBetterPlansOverWorsePlans(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
List<List<Integer>> clonePlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
clonePlan.add( new ArrayList<Integer>(execution.plan.get(roundNumber)) );
}
executionPlans.set( index + executionPlans.size()/2, new ExecutionPlan(clonePlan) );
}
}
private void computeCostOfPlans(List<ExecutionPlan> executionPlans) {
for ( ExecutionPlan execution: executionPlans) {
execution.cost = 0;
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
// cost of a round is greatest "communication time".
List<Integer> round = execution.plan.get(roundNumber);
int roundCost = round.get(0)>round.get(1)?round.get(0):round.get(1);
roundCost = execution.cost>round.get(2)?roundCost:round.get(2);
roundCost = execution.cost>round.get(3)?roundCost:round.get(3);
// add all the round costs' to determine total plan cost
execution.cost += roundCost;
}
}
}
private List<ExecutionPlan> createAndIntializePlans(List<Integer> processes) {
List<ExecutionPlan> executionPlans = new ArrayList<ExecutionPlan>();
for ( int planNumber = 0; planNumber < 10; ++planNumber) {
// randomize the processes for this plan
Collections.shuffle(processes);
// and make the plan
List<List<Integer>> currentPlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
List<Integer> round = new ArrayList<Integer>();
round.add(processes.get(4*roundNumber+0));
round.add(processes.get(4*roundNumber+1));
round.add(processes.get(4*roundNumber+2));
round.add(processes.get(4*roundNumber+3));
currentPlan.add(round);
}
executionPlans.add(new ExecutionPlan(currentPlan));
}
return executionPlans;
}
}

Calculating the number of primes in a range RMI version vs concurrent version

I have two versions of a program with the same purpose: to calculate how many prime numbers there are between 0 and n.
The first version uses concurrency, a Callable class "does the math" and the results are retrieved though a Future array. There are as many created threads as processors in my computer (4).
The second version is implemented via RMI. All four servers are registered in the local host. The servers are working in paralell as well, obviously.
I would expect the second version to be slower than the first, because I guess the network would involve latency and the other version would just run the program concurrently.
However, the RMI version is around twice faster than the paralel version... Why is this happening?!
I didn't paste any code because it'd be huge, but ask for it in case you need it and I'll see what I can do...
EDIT: adding the code. I commented the sections where unnecessary code was to be posted.
Paralell version
public class taskPrimes implements Callable
{
private final long x;
private final long y;
private Long total = new Long(0);
public taskPrimes(long x, long y)
{
this.x = x;
this.y = y;
}
public static boolean isPrime(long n)
{
if (n<=1) return false ;
for (long i=2; i<=Math.sqrt(n); i++)
if (n%i == 0) return false;
return true;
}
public Long call()
{
for (long i=linf; i<=lsup;i++)
if (isPrime(i)) total++;
return total;
}
}
public class paralelPrimes
{
public static void main(String[] args) throws Exception
{
// here some variables...
int nTasks = Runtime.getRuntime().availableProcessors();
ArrayList<Future<Long>> partial = new ArrayList<Future<Long>>();
ThreadPoolExecutor ept = new ThreadPoolExecutor();
for(int i=0; i<nTasks; i++)
{
partial.add(ept.submit(new taskPrimes(x, y))); // x and y are the limits of the range
// sliding window here
}
for(Future<Long> iterator:partial)
try { total += iterator.get(); } catch (Exception e) {}
}
}
RMI version
Server
public class serverPrimes
extends UnicastRemoteObject
implements interfacePrimes
{
public serverPrimes() throws RemoteException {}
#Override
public int primes(int x, int y) throws RemoteException
{
int total = 0;
for(int i=x; i<=y; i++)
if(isPrime(i)) total++;
return total;
}
#Override
public boolean isPrime(int n) throws RemoteException
{
if (n<=1) return false;
for (int i=2; i<=Math.sqrt(n); i++)
if (n%i == 0) return false ;
return true;
}
public static void main(String[] args) throws Exception
{
interfacePrimes RemoteObject1 = new serverPrimes();
interfacePrimes RemoteObject2 = new serverPrimes();
interfacePrimes RemoteObject3 = new serverPrimes();
interfacePrimes RemoteObject4 = new serverPrimes();
Naming.bind("Server1", RemoteObject1);
Naming.bind("Server2", RemoteObject2);
Naming.bind("Server3", RemoteObject3);
Naming.bind("Server4", RemoteObject4);
}
}
Client
public class clientPrimes implements Runnable
{
private int x;
private int y;
private interfacePrimes RemoteObjectReference;
private static AtomicInteger total = new AtomicInteger();
public clientPrimes(int x, int y, interfacePrimes RemoteObjectReference)
{
this.x = x;
this.y = y;
this.RemoteObjectReference = RemoteObjectReference;
}
#Override
public void run()
{
try
{
total.addAndGet(RemoteObjectReference.primes(x, y));
}
catch (RemoteException e) {}
}
public static void main(String[] args) throws Exception
{
// some variables here...
int nServers = 4;
ExecutorService e = Executors.newFixedThreadPool(nServers);
double t = System.nanoTime();
for (int i=1; i<=nServers; i++)
{
e.submit(new clientPrimes(xVentana, yVentana, (interfacePrimes)Naming.lookup("//localhost/Server"+i)));
// sliding window here
}
e.shutdown();
while(!e.isTerminated());
t = System.nanoTime()-t;
}
}

One interesting thing to consider is that, by default, the jvm runs in client mode. This means that threads won't span over the cores in the most agressive way. Trying to run the program with -server option can influence the result although, as mentioned, the algorithm design is crucial the concurrent version may have bottlenecks. There is little chance that, given the problem, there is a bottleneck in your algorithm, but it sure needs to be considered.
The rmi version truly runs in parallel because each object runs on a different machine, since this tends to be a processing problem more than a communication problem then the latency plays a non important part.
[UPDATE]
Now that I saw your code lets get into some more details.
You are relying on the ThreadExecutorPool and Future to perform the thread control and synchronization for you, this means (by the documentation) that your running objects will be allocated on an existing thread and once your object finishes its computation the thread will be returned to that pool, on the other hand the Future will check periodically if the computation has finished so it can collect the values.
This scenario would be best fit for some computation that is performed periodically in a way that the ThreadPool could increase performance by having the threads pre-allocated (having the overhead of thread creation only on the first time the threads aren't there).
Your implementation is correct, but it is more centered on the programmer convinience (there is nothing wrong with this, I am always defending this point of view) than on system performance.
The RMI version performs differently due (mainly) of 2 things:
1 - you said you are running on the same machine, most OS will recognize localhost, 127.0.0.1 or even the real self ip address as being its self address and perform some optimizations on the communication, so little overhead from the network here.
2 - the RMI system will create a separate thread for each server object you created (as I mentioned before) and these servers will starting computing as soon as they get called.
Things you should try to experiment:
1 - Try to run your RMI version truly on a network, if you can configure it for 10Mbps would be better to see communication overhead (although, since it is a one shot communication it may have little impact to notice, you could chance you client application to call for the calculation multiple times, and then you see the lattency in the way)
2 - Try to change you parallel implementation to use Threads directly with no Future (you could use Thread.join to monitor execution end) and then use the -server option on the machine (although, sometimes, the JVM performs a check to see if the machine configuration can be truly said to be a server and will decline to move to that profile). The main problem is that if your threads doesn't get to use all the computer cores you won't see any performance improvent. Also try to perform the calculations many time to overcome the overhead of thread creation.
Hope that helps to elucidate the situation :)
Cheers

It depends on how your Algorithms are designed for parallel and concurrent solutions. There is no a criteria where parallel must be better than concurrent or viceversa. By example if your concurrent solution has many synchronized blocks it can drop your performance, in the other case maybe the communication in your parallel algorithm is minimum then there is no overhead on network.
If you can get a copy o the book of Peter Pacheco it can clear some ideas:http://www.cs.usfca.edu/~peter/ipp/

Given the details you provided, it will mostly depend on how large a range you're using, and how efficiently you distribute the work to the servers.
For instance, I'll bet that for a small range N you will probably have no speedup from distributing via RMI. In this case, the RMI (network) overhead will likely outweigh the benefit of distributing over multiple servers. When N becomes large, and with an efficient distribution algorithm, this overhead will become more and more negligible with regards to the actual computation time.
For example, assuming homogenous servers, a relatively efficient distribution could be to tell each server to compute the primes for all the numbers n such that n % P = i, where n <= N, P is the number of servers, i is an index in the range [0, P-1] assigned to each server, and % is the modulo operation.

Modification to Dijkstra, verification needed

Dijkstra algorithm has a step which mentions "chose the node with shortest path". I realize that this step is unnecessary if we dont throw a node out of the graph/queue. This works great in my knowledge with no known disadvantage. Here is the code. Please instruct me if it fails ? if it does then how ? [EDIT => THIS CODE IS TESTED AND WORKS WELL, BUT THERE IS A CHANCE MY TEST CASES WERE NOT EXHAUSTIVE, THUS POSTING IT ON STACKOVERFLOW]
public Map<Integer, Integer> findShortest(int source) {
final Map<Integer, Integer> vertexMinDistance = new HashMap<Integer, Integer>();
final Queue<Integer> queue = new LinkedList<Integer>();
queue.add(source);
vertexMinDistance.put(source, 0);
while (!queue.isEmpty()) {
source = queue.poll();
List<Edge> adjlist = graph.getAdj(source);
int sourceDistance = vertexMinDistance.get(source);
for (Edge edge : adjlist) {
int adjVertex = edge.getVertex();
if (vertexMinDistance.containsKey(adjVertex)) {
int vertexDistance = vertexMinDistance.get(adjVertex);
if (vertexDistance > (sourceDistance + edge.getDistance())) {
//previous bug
//vertexMinDistance.put(adjVertex, vertexDistance);
vertexMinDistance.put(adjVertex, sourceDistance + edge.getDistance())
}
} else {
queue.add(adjVertex);
vertexMinDistance.put(adjVertex, edge.getDistance());
}
}
}
return vertexMinDistance;
}

Problem 1
I think there is a bug in the code where it says:
int vertexDistance = vertexMinDistance.get(adjVertex);
if (vertexDistance > (sourceDistance + edge.getDistance())) {
vertexMinDistance.put(adjVertex, vertexDistance);
}
because this has no effect (vertexMinDistance for adjVertex is set back to its original value).
Better would be something like:
int vertexDistance = vertexMinDistance.get(adjVertex);
int newDistance = sourceDistance + edge.getDistance();
if (vertexDistance > newDistance ) {
vertexMinDistance.put(adjVertex, newDistance );
}
Problem 2
You also need to add the adjVertex into the queue using something like:
int vertexDistance = vertexMinDistance.get(adjVertex);
int newDistance = sourceDistance + edge.getDistance();
if (vertexDistance > newDistance ) {
vertexMinDistance.put(adjVertex, newDistance );
queue.add(adjVertex);
}
If you don't do this then you will get an incorrect answer for graphs such as:
A->B (1)
A->C (10)
B->C (1)
B->D (10)
C->D (1)
The correct path is A->B->C->D of weight 3, but without the modification then I believe your algorithm will choose a longer path (as it doesn't reexamine C once it has found a shorter path to it).
High level response
With these modifications I think this approach is basically sound, but you should be careful about the computational complexity.
Dijkstra will only need to go round the main loop V times (where V is the number of vertices in the graph), while your algorithm may need many more loops for certain graphs.
You will still get the correct answer, but it may take longer.
Although the worst-case complexity will be much worse than Dijkstra, I would be interested in how well it performs in practice. My guess is that it will work well for sparse almost tree-like graphs, but less well for dense graphs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.