Datastrutcture for finding shortest path between two strings - java

I am creating a program that will take a wordlist of 5 000 strings and find the shortest path from one string to another. For example abc -> bac could print "abc, bbc, bac".
I am pretty sure about what I want to do, the only thing I'm not completely sure about is what datastructure should represent my wordlist. The goal is for the search(BFS) to run as fast as possible, so to sacrifice some space is no problem. I am thinking either a BST or an adjacency list, but since I'm no expert at datastrutcutres' timecomplexity I want to be certain before I start adjusting my code. Can anyone recommend one of the structures over the other? Or have I perhaps missed a datastructure that is an obvious alternative for this?

Looks like what you are looking for is the Levenshtein distance, here is the Rosetta code implementation, you should be able to change it to suit your need:
public class Levenshtein {
public static int distance(String a, String b) {
a = a.toLowerCase();
b = b.toLowerCase();
// i == 0
int [] costs = new int [b.length() + 1];
for (int j = 0; j < costs.length; j++)
costs[j] = j;
for (int i = 1; i <= a.length(); i++) {
// j == 0; nw = lev(i - 1, j)
costs[0] = i;
int nw = i - 1;
for (int j = 1; j <= b.length(); j++) {
int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
nw = costs[j];
costs[j] = cj;
}
}
return costs[b.length()];
}
public static void main(String [] args) {
String [] data = { "kitten", "sitting", "saturday", "sunday", "rosettacode", "raisethysword" };
for (int i = 0; i < data.length; i += 2)
System.out.println("distance(" + data[i] + ", " + data[i+1] + ") = " + distance(data[i], data[i+1]));
}
}

Related

Dijkstra algorithm on a matrix

I have to create an N * M matrix and fill it up with values between 0 and 9. One of the values should be "A" which is the starting point of the graph, and I should find the shortest path to the value "B" (both of these are generated at a random position of the matrix). If the value is 0 it counts as an obstacle, and 2 < N, M < 100.
I have to print out the exact route of the shortest graph and the total cost of it. Also, the problem has to be solved by Dijkstra's algorithm.
I've haven't gotten past filling up the Matrix. I store the values in a 2D String array, but I think I should use different arrays or maybe Maps for storing the positions of key values such as the start and endpoint. I've been thinking on this for 2 days now because I'm a total noob in Java and not much better at programming in general. I'm mainly looking for guidance on how to store the datas and what should I actually store in order to get to the end because I think I overcomplicate the problem.
This is the matrix generating part of the code.
int N = ThreadLocalRandom.current().nextInt(3,7);
int M = ThreadLocalRandom.current().nextInt(3,7);
int J = ThreadLocalRandom.current().nextInt(0,(Math.min(N, M))/2);
int K = 0;
int aPosX = ThreadLocalRandom.current().nextInt(0,N);
int aPosY = ThreadLocalRandom.current().nextInt(0,M);
int bPosX = ThreadLocalRandom.current().nextInt(0,N);
int bPosY = ThreadLocalRandom.current().nextInt(0,M);
String[][] matrix = new String[N][M];
int[][] map = new int[N][M];
int shortestPath = 10;
int currentPosX,currentPosY;
int shortestPosX, shortestPosY;
public void generateMatrix(){
for (int i = 0; i < N; i++) {
for (int j = 0; j < M; j++) {
K = ThreadLocalRandom.current().nextInt(0,10);
matrix[i][j] = String.valueOf(K);
}
}
}
public void createStartAndFinish(){
matrix[aPosX][aPosY] = "A";
matrix[bPosX][bPosY] = "B";
}
}
This part finds the lowest cost adjacent tiles and steps on them but id does generate an out of bounds exception. I'm also aware that it has nothing to do with Dijkstra algorithm but this is my starting point.
public void solveMatrix(){
visited[aPosX][aPosY] = true;
currentPosX = aPosX;
currentPosY = aPosY;
while (!matrix[currentPosX - 1][currentPosY].equals("B") ||
!matrix[currentPosX + 1][currentPosY].equals("B") ||
!matrix[currentPosX][currentPosY - 1].equals("B") ||
!matrix[currentPosX][currentPosY + 1].equals("B")) {
if(currentPosX > 0) {
if(!visited[currentPosX - 1][currentPosY] && Integer.parseInt(matrix[currentPosX - 1][currentPosY]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX - 1][currentPosY]);
shortestPosX = currentPosX - 1;
shortestPosY = currentPosY;
}
}
if(currentPosX + 1 < N){
if(!visited[currentPosX + 1][currentPosY] && Integer.parseInt(matrix[currentPosX + 1][currentPosY]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX + 1][currentPosY]);
shortestPosX = currentPosX + 1;
shortestPosY = currentPosY;
}
}
if(currentPosY > 0){
if(!visited[currentPosX][currentPosY - 1] && Integer.parseInt(matrix[currentPosX][currentPosY - 1]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX][currentPosY - 1]);
shortestPosX = currentPosX;
shortestPosY = currentPosY - 1;
}
}
if(currentPosY - 1 < M){
if(!visited[currentPosX][currentPosY + 1] && Integer.parseInt(matrix[currentPosX][currentPosY + 1]) < shortestPath) {
shortestPath = Integer.parseInt(matrix[currentPosX][currentPosY + 1]);
shortestPosX = currentPosX;
shortestPosY = currentPosY + 1;
}
}
visited[shortestPosX][shortestPosY] = true;
currentPosX = shortestPosX;
currentPosY = shortestPosY;
System.out.println(shortestPosX + " " + shortestPosY + " " + shortestPath);
shortestPath = 10;
}
}

Creating an array from the values of another array?

New to java and I am working on a RLE encoder. A method I am working on right now requires to convert an array of example:
{13,13,13,4,4,4,4,4,4}
to
{3,13,6,4}.
It checks the numbers that appear in consecutive order and then prints that into index[i], then prints the actual number value in index[i+1].
My current issue is that if a number is repeated more than 15 times, I have to start a new "run". So {13,13,13,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,} (there are 20 4's in succession here) will print {3,13,15,4,5,4}. My code currently prints {3,15,20,4}.
import java.util.Arrays;
public class testing {
public static void main(String[] args) {
byte [] flatdata = {15,15,15,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4};
int count = 1;
for (int i = 0; i < flatdata.length - 1; i++) {
if (flatdata[i] != flatdata[i + 1]) {
count++; }
}
byte numLength = 1;
byte indexNum = 0;
int newArraySize = count * 2;
byte[] newArray = new byte[newArraySize];
byte[] arrayWithTotalIndexes = new byte[newArraySize];
int i;
for (i = 0; i < flatdata.length - 1; i++) {
if (flatdata[i] != flatdata[i + 1]) {
newArray[indexNum] = numLength;
newArray[indexNum + 1] = flatdata[i];
indexNum = (byte) (indexNum + 2);
numLength = 1;
} else {
numLength++;
}
}
if (flatdata[i - 1] == flatdata[i]) {
newArray[indexNum] = numLength;
newArray[indexNum + 1] = flatdata[i];
} else {
newArray[indexNum] = numLength;
newArray[indexNum + 1] = flatdata[i];
}
System.out.println(Arrays.toString(flatdata));
System.out.println(Arrays.toString(newArray));
System.out.println("countRuns: " + count);
System.out.println(Arrays.toString(arrayWithTotalIndexes));
byte[] desiredArray = {3,15,15,4,5,4};
System.out.println("Desired Array: " + Arrays.toString(desiredArray));
}
I know my first step has to be to adjust the size of the new array to include the extra entries of the array.
public static void main(String [] args){
byte [] arr = {15,15,15,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,};
for (int i = 0; i < arr.length; i++) {
int count1 = 0;
for (int j = 0; j < arr.length; j++) {
if (arr[i] == arr[j]) {
count++;
}
}
if (count1 > 15) {
System.out.println("true");
break;
}
System.out.println("false");
}
}
I thought about running a separate loop to check if a number is repeated more than 15 times to change the size of the new array but it has been to no avail.
Here is a link to a previous question I asked regarding the same method in case it makes things more clear.
Creating an array from another array?
Would appreciate any tips or ideas on how to tackle this.
Expand the test for starting a new result pair by changing:
if (flatdata[i] != flatdata[i + 1])
to:
if (flatdata[i] != flatdata[i + 1] || numLength == 15)

Smallest sum of triplet products where the middle element is removed using Dynamic Programming

I have given a sequence of N numbers (4 ≤ N ≤ 150). One index i (0 < i < N) is picked and multiplied with the left and the right number, in other words with i-1 and i+1. Then the i-th number is removed. This is done until the sequence has only two numbers left over. The goal is to find the smallest sum of these products which obviously depends on the order in which the indices are picked.
E.g. for the sequence 44, 45, 5, 39, 15, 22, 10 the smallest sum would be 17775
using the indices in the following order: 1->3->4->5->2 which is the sum:
44*45*5 + 5*39*15 + 5*15*22 + 5*22*10 + 44*5*10 = 9900 + 2925 + 1650 + 1100 + 2200 = 17775
I have found a solution using a recursive function:
public static int smallestSum(List<Integer> values) {
if (values.size() == 3)
return values.get(0) * values.get(1) * values.get(2);
else {
int ret = Integer.MAX_VALUE;
for (int i = 1; i < values.size() - 1; i++) {
List<Integer> copy = new ArrayList<Integer>(values);
copy.remove(i);
int val = smallestSum(copy) + values.get(i - 1) * values.get(i) * values.get(i + 1);
if (val < ret) ret = val;
}
return ret;
}
}
However, this solution is only feasible for small N but not for a bigger amount of numbers. What I am looking for is a way to do this using an iterative Dynamic Programming approach.
The optimal substructure needed for a DP is that, given the identity of the last element removed, the elimination strategy for the elements to the left is independent of the elimination strategy for the elements to the right. Here's a new recursive function (smallestSumA, together with the version from the question and a test harness comparing the two) incorporating this observation:
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
public class Foo {
public static void main(String[] args) {
Random r = new Random();
for (int i = 0; i < 10000; i++) {
List<Integer> values = new ArrayList<Integer>();
for (int n = 3 + r.nextInt(8); n > 0; n--) {
values.add(r.nextInt(100));
}
int a = smallestSumA(values, 0, values.size() - 1);
int q = smallestSumQ(values);
if (q != a) {
System.err.println("oops");
System.err.println(q);
System.err.println(a);
System.err.println(values);
}
}
}
public static int smallestSumA(List<Integer> values, int first, int last) {
if (first + 2 > last)
return 0;
int ret = Integer.MAX_VALUE;
for (int i = first + 1; i <= last - 1; i++) {
int val = (smallestSumA(values, first, i)
+ values.get(first) * values.get(i) * values.get(last) + smallestSumA(values, i, last));
if (val < ret)
ret = val;
}
return ret;
}
public static int smallestSumQ(List<Integer> values) {
if (values.size() == 3)
return values.get(0) * values.get(1) * values.get(2);
else {
int ret = Integer.MAX_VALUE;
for (int i = 1; i < values.size() - 1; i++) {
List<Integer> copy = new ArrayList<Integer>(values);
copy.remove(i);
int val = smallestSumQ(copy) + values.get(i - 1) * values.get(i) * values.get(i + 1);
if (val < ret)
ret = val;
}
return ret;
}
}
}
Invoke as smallestSum(values, 0, values.size() - 1).
To get the DP, observe that there are only N choose 2 different settings for first and last, and memoize. The running time is O(N^3).
If anyone is interested in a DP solution, based on David Eisenstat's recursive solution, here is an iterative one using DP (for many big numbers it's useful to replace int's with long's):
public static int smallestSum(List<Integer> values) {
int[][] table = new int[values.size()][values.size()];
for (int i = 2; i < values.size(); i++) {
for (int j = 0; j + i < values.size(); j++) {
int ret = Integer.MAX_VALUE;
for (int k = j + 1; k <= j + i - 1; k++) {
int val = table[j][k] + values.get(j) * values.get(k) * values.get(j + i) + table[k][j + i];
if (val < ret) ret = val;
}
table[j][j + i] = ret;
}
}
return table[0][values.size() - 1];
}

How to test if strings contain only one difference efficiently?

I was asked this question on a technical interview. Question is: given a target, and an array of strings, return an array that contains all strings with ONLY one difference to the target.
For example if target is cat, catt, caT, caa, ca, at <-- are all only one difference. And conversely, cat, cattt, dog, flower, c <-- are not one difference and should not be returned.
oneDiff(String target, String[] a) ...
My approach was:
ans = []
for all element e in the array
count -> 0
if absoulte(e's length - target length) > 1
continue
endif
for all character c in e
scan through, increment count if difference is found.
endfor
if (count == 1)
continue
else
add e to ans
endfor
return ans
But the interviewer wasnt happy with what's above. Anyone has any efficient/clever ideas?
Thanks
As mentioned by zubergu Levenshtein distance will solve your problem. You can find Levenshtein distance in java here.
Edit: Since you tagged it as java you can run the following java code:
public class Levenshtein {
public static int distance(String a, String b) {
a = a.toLowerCase();
b = b.toLowerCase();
// i == 0
int [] costs = new int [b.length() + 1];
for (int j = 0; j < costs.length; j++)
costs[j] = j;
for (int i = 1; i <= a.length(); i++) {
// j == 0; nw = lev(i - 1, j)
costs[0] = i;
int nw = i - 1;
for (int j = 1; j <= b.length(); j++) {
int cj = Math.min(1 + Math.min(costs[j], costs[j - 1]), a.charAt(i - 1) == b.charAt(j - 1) ? nw : nw + 1);
nw = costs[j];
costs[j] = cj;
}
}
return costs[b.length()];
}
public static void main(String [] args) {
String comparison = "cat";
String [] data = { "cattt", "catt", "caT", "caa", "ca", "at" };
for (int i = 0; i < data.length; i++)
System.out.println("distance(" + comparison + ", " + data[i] + ") = " + distance(comparison, data[i]));
}
}
If you run the code you will see the following output:
distance(cat, cattt) = 2
distance(cat, catt) = 1
distance(cat, caT) = 0
distance(cat, caa) = 1
distance(cat, ca) = 1
distance(cat, at) = 1
If the distance is 0 or 1 then its acceptable.

CYK algorithm implementation java

I'm trying to implement the CYK algorithm based on wikipedia pseudocode. When I test the string "a b" for the grammar input:
S->A B
A->a
B->b
It gives me false, and I think it should be true. I have an arraylist called AllGrammar that contains all the rules. For the example above it would contain:
[0]: S->A B[1]: A->a[2]: B->bFor the example S->hello and the input string hello it gives me true as it should. More complex tests (more productions) gives me false :S
public static boolean cyk(String entrada) {
int n = entrada.length();
int r = AllGrammar.size();
//Vector<String> startingsymbols = getSymbols(AllGrammar);
String[] ent = entrada.split("\\s");
n = ent.length;
System.out.println("length of entry" + n);
//let P[n,n,r] be an array of booleans. Initialize all elements of P to false.
boolean P[][][] = initialize3DVector(n, r);
//n-> number of words of string entrada,
//r-> number of nonterminal symbols
//This grammar contains the subset Rs which is the set of start symbols
for (int i = 1; i < n; i++) {
for(int j = 0; j < r; j++) {
String[] rule = (String[]) AllGrammar.get(j);
if (rule.length == 2) {
if (rule[1].equals(ent[i])) {
System.out.println("entrou");
System.out.println(rule[1]);
P[i][1][j + 1] = true;
}
}
}
}
for(int i = 2; i < n; i++) {
System.out.println("FIRST:" + i);
for(int j = 1; j < n - i + 1; j++) {
System.out.println("SECOND:" + j);
for(int k = 1; k < i - 1; k++) {
System.out.println("THIRD:" + k);
for(int g = 0; g < r; g++) {
String[] rule = (String[]) AllGrammar.get(g);
if (rule.length > 2) {
int A = returnPos(rule[0]);
int B = returnPos(rule[1]);
int C = returnPos(rule[2]);
System.out.println("A" + A);
System.out.println("B" + B);
System.out.println("C" + C);
if (A!=-1 && B!=-1 && C!=-1) {
if (P[j][k][B] && P[j + k][i - k][C]) {
System.out.println("entrou2");
P[j][i][A] = true;
}
}
}
}
}
}
}
for(int x = 0; x < r; x++) {
if(P[1][n][x]) return true;
}
return false;
}
As compared to the CYK algorithm:
you have indexing starting at 1, but the arrays would appear to start at 0
the function returnpos() is not defined, and it's not obvious what it does.
It would seem the problems could be fairly basic in the the use of indexes. If you are new to the language, you might want to get a refresher.

Categories

Resources