Condition to terminate BFS - java

I have this assignment where given a list of tuples where each tuple contains 2 Strings like this :
[ ("...","...") , ("...","...") , ("...","...") ... ]
I have to calculate the shortest path which will lead to an extreme-string.
An extreme-string is defined as a tuple of 2 strings where the first string is equal to the second string.
I know this might sound confusing so let me set an example.
Given :
The list [("0","100") , ("01","00") , ("110","11")]
With indices 0,1,2
The shortest path is : [2,1,2,0]
The extreme-string is equal to : "110011100"
Step by step explanation :
Starting with tuple of index 2 the initial string is : "110","11"
Appending tuple of index 1 next string is : "11001","1100"
Appending tuple of index 2 next string is : "11001110","110011"
Appending tuple of index 0 final string is : "110011100","110011100"
So say you begin with tuple ("X","Y") and then you pick tuple ("A","B") then result is ("XA","YB").
The way I approached this problem was using BFS which I already implemented and sounds right to me but there is an issue I am dealing with.
If the input is something like :
[("1","111")]
then the algorithm will never terminate as it will always be in the state "111..." - "111111111..." .
Checking for this specific input is not a good idea as there many inputs that can reproduce this result.
Having an upper bound for the iterations is also not a good idea because in some cases a finite result may actually exist after the iterations bound.
Any insight would be really useful.

Since its an assignment I can't really solve it for you, but I'll try to give tips:
BFS sounds great to me as well.
One thing that differentiates the BFS from, say, DFS is that you place the elements of level N into the queue (as opposed to stack). Since queue is FIFO, you'll process the elements of Level N before elements at the level of N + 1. So this algorithm will finish (although might occupy a lot of memory).
The interesting part is what exactly you put into the queue and how you organize the traversal algorithm. This is something that I feel you've already solved or at least you have a direction. So think about my previous paragraph and hopefully you'll come to the solution ;)

Related

Depth first search or backtrack recursion for finding all possible combination of letters in a crossword puzzle/boggle board?

What would be the time complexity? I just want to avoid this being O(n!). Would using depth first search be time complexity O(n^2), as for each letter it may have to go through all other letters worst case?
I guess I'm not sure if I'm thinking about this the right way.
When I say use depth first search, I mean starting depth first search from first the letter, and then starting from the second letter, etc.
Is that necessary?
Note:
The original problem is to find all possible words in a crossword/boggle board. I'm thinking of using the trie data structure to find if a word is in the dictionary, but am thinking about ways of generating the words themselves.
Following the discussion above, here is my answer:
Definition: a trieX is a sub trie, with words of length X only.
Since we have a trie with all words in the desired language, we can also get the appropriate trieX.
We say that the crossword puzzle has w words, so we create an array w long where each entry is the root of a trieX where X is the length of the relevantword. This gives us the list of possible words in each blank word.
Then the iterate over intersections between words and eliminate words that can not be placed. When there are no more changes done - we stop.
Two remarks:
1. In order to improve performance, we start by adding either long words, or VERY short ones. What is short or long? have a look at this and this.
2. Elimination of words from the trieX's can also be done by checking dependencies between words (if THIS words is here, then THAT words can't be there, etc.). This is more complicated, so if anyone wants to add some ideas on how to do this easily - please do.

Best approach to solve Word Chain

I am trying to solve this problem in CodeEval.
In this challenge we suggest you to play in the known game "Word
chain" in which players come up with words that begin with the letter
that the previous word ended with. The challenge is to determine the
maximum length of a chain that can be created from a list of words.
Example:
Input:
soup,sugar,peas,rice
Ouput:
4
Explanation: We can form a chain of 4 words like this: "soup->peas->sugar->rice".
Constraints:
The length of a list of words is in range [4, 35].
A word in a list of words is represented by a random lowercase ascii string with the length of [3, 7] letters.
There is no repeating words in a list of words.
My attempt: My approach is to model the words as a graph, such that each word in the inputs represents a node and there is an (directed) edge between from wordi to wordj if last character of wordi is equal to the first character of wordj.
After that I am running bfs from each node and computing the length of the farthest node from the this node. The final result is the maximum value possible for all nodes.
But this approach is not giving me a full score. Hence, my question is how to solve this problem correctly and efficiently?
For my reputation is less than 50, so I can't make a comment...
If the total number of word is less than 20, we can solve using dynamic programming and bitmask.
make dp[20][1<<20]. dp[i][j] means currently you are in i, and you have visit the bitmask j's word.
For number is bigger than 20, I still haven't a good idea. May be we need to use some random algorithm, perhaps...。
My idea is to use dfs and add some optimizaion, because 35 is not too big. I think it's enough to solve the problem.
See the solution mentioned here: Detecting when matrix multiplication is possible
The solution to your problem is pretty much same. Create a directed graph such that for every work add an edge from first letter to last letter.
Then find a Euler path ( http://en.wikipedia.org/wiki/Euler_path ) in that graph.
EDIT: I see that you are not assured of using all words and you need the longest path in the graph ( http://en.wikipedia.org/wiki/Longest_path_problem ). This problem is NP-complete.
See the solution mentioned word chain in core java
The page gives a solution in Core Java, it follows the following process:
Load the Dictionary Items in memory for a given word length
Get the next eligible list of words from the memory for the given word
There is another approach using the Map/reduce hadoop framework, which is mentioned in detail in the word chain using map-reduce

What's the best way to iterate through all combinations of a multi-dimensional array of unknown sizes without repeating any combination?

ArrayList<ArrayList<ArrayList<String>>> one = new ArrayList<ArrayList<ArrayList<String>>>();
one would look something like this with some example values:
[
[
["A","B","C",...],
["G","E","J",...],
...
],
[
["1","2",...],
["8","5","12","7",...],
...
],
...
]
Assuming that there will always be one base case, at least one letter arraylist (e.g. ["A","B","C"]), but there could be more (e.g. ["X,"Y","Z"]) and there may be any size of number arraylists, maybe none at all, but could be hundreds (e.g. ["1","2","3"],...,["997","998","999"]). Also, there could be more types of arraylists (e.g. ["#","#","$"]) of any size. So really the only thing that is definitive is that ALWAYS:
one.size()>=1
one.get(0).size()>=1
one.get(0).get(0).size()>=1
So the problem is: How can I best get every combination of each category without knowing how large each arraylist will be or having any repeats but assuming that one.get(0).get(0) is valid? e.g. ["A","B","C",...] ["1","2",...] ..., ["A","B","C",...] ["8","5","12","7",...] .... I'm using Java in my project currently but an any algorithm that works I can convert over myself. I apologize if this is not clear, I'm having a hard time putting it into words which is probably part of why I can't think of a solution.
I know two solutions to this, the recursive and the non recursive. Here's the non recursive (similar to the answer at How to get 2D array possible combinations )
1) Multiply the length of every array together. This is the number of possible combinations you can make. Call this totalcombinations.
2) Set up an int[] array called counters. It should be as long as the number of arrays, and all initialized to 0.
3a) For totalcombinations times, concatenate counter[0]th entry in arrays[0], the counter[1]th entry in arrays[1]... etc and add it to the list of all results.
3b) Then set j = 0 and increment counters[j]. If this causes counters[j] > arrays[j].length, then counters[j] = 0, ++j and increment the new counters[j] (e.g. repeat 3b)) until you do not get such an overflow.
If you imagine counters as being like the tumblers of a suitcase - when you overflow the first digit from 9 to 0, the next one ticks over - then you should get the strategy here.

Returning a Subset of Strings from 10000 ascii strings

My college is getting over so I have started preparing for the interviews to get the JOB and I came across this interview question while I was preparing for the interview
You have a set of 10000 ascii strings (loaded from a file)
A string is input from stdin.
Write a pseudocode that returns (to stdout) a subset of strings in (1) that contain the same distinct characters (regardless of order) as
input in (2). Optimize for time.
Assume that this function will need to be invoked repeatedly. Initializing the string array once and storing in memory is okay .
Please avoid solutions that require looping through all 10000 strings.
Can anyone provide me a general pseudocode/algorithm kind of thing how to solve this problem? I am scratching my head thinking about the solution. I am mostly familiar with Java.
Here is an O(1) algorithm!
Initialization:
For each string, sort characters, removing duplicates - eg "trees" becomes "erst"
load sorted word into a trie tree using the sorted characters, adding a reference to the original word to the list of words stored at the each node traversed
Search:
sort input string same as initialization for source strings
follow source string trie using the characters, at the end node, return all words referenced there
They say optimise for time, so I guess we're safe to abuse space as much as we want.
In that case, you could do an initial pass on the 10000 strings and build a mapping from each of the unique characters present in the 10000 to their index (rather a set of their indices). That way you can ask the mapping the question, which sets contain character 'x'? Call this mapping M> ( order: O(nm) when n is the number of strings and m is their maximum length)
To optimise in time again, you could reduce the stdin input string to unique characters, and put them in a queue, Q. (order O(p), p is the length of the input string)
Start a new disjoint set, say S. Then let S = Q.extractNextItem.
Now you could loop over the rest of the unique characters and find which sets contain all of them.
While (Q is not empty) (loops O(p)) {
S = S intersect Q.extractNextItem (close to O(1) depending on your implementation of disjoint sets)
}
voila, return S.
Total time: O(mn + p + p*1) = O(mn + p)
(Still early in the morning here, I hope that time analysis was right)
As Bohemian says, a trie tree is definitely the way to go!
This sounds like the way an address book lookup would work on a phone. Start punching digits in, and then filter the address book based on the number representation as well as any of the three (or actually more if using international chars) letters that number would represent.

Find K max values from a N List

I got requirements-
1. Have random values in a List/Array and I need to find 3 max values .
2. I have a pool of values and each time this pool is getting updated may be in every 5 seconds, Now every time after the update , I need to find the 3 max Values from the list pool.
I thought of using Math.max thrice on the list but I dont think it as
a very optimized approach.
> Won't any sorting mechanism be costly as I am bothered about only top
3 Max Values , why to sort all these
Please suggest the best way to do it in JAVA
Sort the list, get the 3 max values. If you don't want the expense of the sort, iterate and maintain the n largest values.
Maintain the pool is a sorted collection.
Update: FYI Guava has an Ordering class with a greatestOf method to get the n max elements in a collection. You might want to check out the implementation.
Ordering.greatestOf
Traverse the list once, keeping an ordered array of three largest elements seen so far. This is trivial to update whenever you see a new element, and instantly gives you the answer you're looking for.
A priority queue should be the data structure you need in this case.
First, it would be wise to never say again, "I dont think it as a very optimized approach." You will not know which part of your code is slowing you down until you put a profiler on it.
Second, the easiest way to do what you're trying to do -- and what will be most clear to someone later if they are trying to see what your code does -- is to use Collections.sort() and pick off the last three elements. Then anyone who sees the code will know, "oh, this code takes the three largest elements." There is so much value in clear code that it will likely outweigh any optimization that you might have done. It will also keep you from writing bugs, like giving a natural meaning to what happens when someone puts the same number into the list twice, or giving a useful error message when there are only two elements in the list.
Third, if you really get data which is so large that O(n log n) operations is too slow, you should rewrite the data structure which holds the data in the first place -- java.util.NavigableSet for example offers a .descendingIterator() method which you can probe for its first three elements, those would be the three maximum numbers. If you really want, a Heap data structure can be used, and you can pull off the top 3 elements with something like one comparison, at the cost of making adding an O(log n) procedure.

Categories

Resources