Order a list of characters, given a dictionary

Order a list of characters, given a dictionary - java

I was asked this question in an interview. Suppose you have an ordered dictionary, and are given a list of unordered characters- how would you order these characters by precedence? This dictionary contains words where all the 26 characters are guaranteed to appear. However, note that the size of the dictionary might be anything. The dictionary could be as small as a few words and may not have separate sections for each character e.g., there might be no sections for words beginning with a; although a will appear as part of another word e.g., "bat".
The dictionary might be "ordered" (/sarcasm) as such "zebra', "apple", "cat", "crass", and if you're given the list {a, z, r}, the correct order would be {z, a, r}. Since "zebra" is before "apple" in the dictionary, we know z comes before a in the presedence. Since "apple" comes before "cat", we know a comes before c. Since "cat" comes before "crass", we know that a comes before r. This ordering leaves c and r with ambugious presendece, but since the list of letters was {a, z, r}, we know the solution to be {z, a, r}.

Use a directed graph with 26 vertices, each vertex represents a character. An edge from vertex A to vertex B means in the alphabet B is in front of A.
The first step is to establish such a graph with only vertices but NO edges.
Second, you scan the input dictionary, word by word. And compare each word with the previous word. You should find exact one relationship for each word you scanned. So you add an edge in this graph. Assume the dictionary is correct, there should be no conflicts.
After you finished the dictionary, you output the alphabet by
pick a random vertex, traverse its path until you find the one character that points to nothing. This is the first character in the alphabet. Output it and delete it from the graph.
keep doing 1 until all vertices are deleted.
EDIT:
To better explain this algorithm, let's run it on your sample input.
Input: {"zebra', "apple", "cat", "crass"}
Word 0 and word 1, we immediately know that z comes before a, so we make an edge a->z
Word 1 and word 2, we immediately know that a comes before c, so we make another edge c->a
Word 2 and Word 3, the first letters are the same "c", but the second ones differ, so we learn that a comes before r, so we have another edge r->a
Now all the words are read. Output the order by pick up a vertex randomly (say we pick c), then we have c->a->z in the graph. Output z and delete z from the graph (mark it as NULL). Now pick another one (say we pick r), then we find r->a in the graph. We output a and delete a from graph. Now we pick another one (say we pick c again), there's no path found, so we just output c and delete it. Now we pick the last one, r, there's no path again, so we output r and delete it. Since all vertices are deleted, the algorithm is done.
The output is z, a, c, r. The ordering of "c" and "r" are random since we don't really know their relationship from the input.

From the fact that "zebra' < "apple" < "cat" < "crass", the most efficient way to derive the per-character relationships is to have a loop consider the Nth character of all words, where N is initially 0 yielding the relationships "z" < "a" < "c". That loop can recursively extract relationships for the (N + 1)th character for groups of words with the same prefix (i.e. text in positions <= N). Doing that for N == 1 with same-prefixed "cat" and "crass" yields the relationship "a" < "r".
We can represent known relationships in a 2 dimensional array of x < y truth values.
y\x a b c...r...z
a - N N Y
b -
c Y - Y
r Y -
z N N -
The brute force approach is to iterate over all pairs of characters in the input list (i.e. {a, z, r} -> az, ar, zr) looking up the table for a<z, a<r, z<r: if this is ever false, then swap the characters and restart the whole she-bang. When you make it through the full process without having had to swap any more characters, the output is sorted consistently with the rules. This is a bit like doing a bubble sort.
To make this faster, we can be more proactive about populating cells in our table for implied relationships: for example, we know "z" < "a" < "c" and "a" < "r", so we deduce that "z" < "r". We could do this by running through the "naive" table above to find everything we know about each character (e.g. that z<a and z<c) - then run through what we know about a and c. To avoid excessively deep trees, you could just follow one level of indirection like this, then repeat until the table was stable.

Based on how you describe the problem, your example is incorrect. Your answer should be {z,r,a}. However that may be, below is a code that solves the problem. You can modify it to return an order different from my supposed {z,r,a}.
Set<Character> charPrecedence(List<String> dictionary, char[] letters){
Set<Character> result = new HashSet<Character>();
//since your characters are the 26 letters instead of the 256 chars
// a bit vector won't do; you need a map or set
Set<Character> alphabets = new HashSet<Character>();
for(char c: letters)
alphabets.add(c);
//now get to work
for(String word: dictionary){
if(alphabets.isEmpty()) return result;
for(char c: word.toCharArray()){
if(alphabets.remove(c))
result.add(c);
}
}
//since the dictionary is guaranteed to contain all the 26 letters,
//per the problem statement, then at this point your work is done.
return result;
}
best case O(1); worst case O(n) where n is the number of characters in the dictionary, i.e., one particular letter appears only once and is the last character you check.

Related

All Possible Combinations and their Substitutions given a string

I am working on a problem:
Given a password represented as a string and a character map that contains common characters and substitutions, create a list of all possible password combinations that can be created.
Map: {a=#, s=$, E=3, i=!, o=0}
Password String: "password"
Possible outcomes
p#ssword
p#$sword
pa$sword
p#s$word
p#$$word
pa$$word
pas$word
p#ssw0rd
p#$sw0rd
pa$sw0rd
p#s$w0rd
p#$$w0rd
pa$$w0rd
pas$w0rd
passw0rd
My logic so far:
Replace everything except the index you are on. This solution gives the most results but not complete result. It misses about 5 combinations.
My Code so far: it doesnt work and i havent implemented the solution above.
for(int i = 0;i < arr.length;i++){
for(int j =i; j < arr.length ; j++){
if(map.containsKey(arr[j])){
arr[j] = map.get(arr[j]);
}
//list.add(arr);
}
list.add(arr);
}

Think of the password like a binary number as an incrementing counter:
0000
0001
0010
0011
0100...
Each bit can be 'on' or 'off'. The "bits" of your number are the substitutable letters, and in this case it makes more sense to do the incrementing from left to right.
At any given time, you're either checking a bit, or "carrying" an increment to the next bit. The letters that don't substitute always "carry" automatically. The letters that do substitute increment to the next "value" (you could do this with more than one substitution, like i = {i, I, l, 1}). If you would carry past the "highest value", reset the "bit" to the initial value and "carry" to the next bit that would change. If that bit also carries, keep moving to the right, but if you change a bit, return to the first bit and continue from there.
If the process would go past the end of the word, it would reset the entire value to the initial one so you don't need to print that one and you can terminate the algorithm.
EDIT (by asker request): Let's use your example to illustrate.
In your example, you have two values for five different 'common' characters: 'a', 'e', 'i', 'o', and 's'. Think of each of these as set of ordered lists of 'changeable' characters, such as { <'a', '#'>, <'e', '3'>, ... } and so on. If a letter doesn't have any alternatives, pretend it's in the set too, but it's just a list with a single character in it, like <'p'>. The reason the list is ordered is so that you can go through it systematically, which allows you to know when you've tried all the values. The order doesn't matter, but if it helps you can imagine the "base" character is the common version.
One way to hold that information would be with a map from each possible character (like 'a', '#', 'b', and so on) to a short array that lists the ones you want to interchange. If there's only one version of a character, you don't need to add it. So in the mapping, both 'a' and '#' would be keys for the same sequence of characters <'a', '#'>, and 'p' wouldn't need to be defined at all.
The first string of characters is "password", which you also store as the first word in your set of possible passwords. You also start with an index pointing to the first character in the string.
The algorithm does this. Look at the current index. Check if the value is in the map, and is not the last value in the list the mapping points to: e.g., if the list you find in the map for 'a' or '#' is <'a', '#'>, then 'a' is not the last value, but '#' is the last value.
If it's in the map but it isn't the last value in its list, change the character to the next value in that list: so, 'a' would become '#'. Then set the counter back to zero (even if it's already zero) and continue the loop. Also, every time you return to index zero, you must store the modified string in your set of passwords because it is now a new possible password.
If it IS at the last value (or if it's not in the map, in which case it's always at the last value because there's only one version), you have to carry instead. Change the character back to the first value in the list (so '#' becomes 'a' again, and 'p' doesn't change), and increment the index. Don't add the current word to the set of passwords, though! You aren't done changing it until it returns to index 0, and it's currently the same as a password you've already added.
If the index would go past the end of the string, you've collected all possible passwords (and the string should now be the same as it was initially) so you can stop.
So in your example, we start by looking at index and see that index 0 is not past the end of the string. Then we look at 'p' and it isn't found in the map, so we 'carry': increment the index and repeat the loop. Now we're at index 1 and see 'a', which is in the map, pointing to <'a', '#'>. Since there's at least one character after 'a' in that list, we change 'a' to the next character '#'. Making any change other than a carry means we store this new password, and return to index 0 to repeat the loop. (The next time we see index 1, we'll change it back to 'a', but not store the password because we need to 'carry' and continue walking through the string until we find something new to change or reach the end.)
(Note that I simplified this with the assumption that you always start testing with the "least" character in the list, so you'd test "password" as an initial string, not "p#s$word". It's not too difficult to modify it to work in the other cases, but may be more efficient to just do a first pass through the test string and set it to all 'base' characters, then take that as the initial password.)

How many ways can we divide a list such that each group has the same first and last members?

I have an array of String called x that holds either "Boy" or "Girl". Then I want to create N groups without changing the order of x. Each group must have a minimum of 2 Strings, and the first member and the last member of the group MUST be the same String (both "Boy" or both "Girl"). Return the amount of ways to create the groups.
Example:
String[] x = {"Boy", "Boy", "Girl", "Boy", "Girl", "Girl", "Boy", "Boy"}; // Can be any length
int N = 3; // N <= x.length/2
groupCount(x, N); // Returns 2
Explanation:
The two possibilities to divide x into three groups are:
["Boy", "Boy"], ["Girl", "Boy", "Girl", "Girl"], ["Boy", "Boy"]
["Boy", "Boy", "Girl", "Boy"], ["Girl", "Girl"], ["Boy", "Boy"]
How do I implement groupCount? I've wrapped my head around it for a while. Thanks.

TLDR; See this implementation.
It seems to be a exercise for recursion.
Algorithm. Read the list and select the first part where the first and the ith element are the same. Remove this part from the array and decrease the number of required splits. Do it until the number is 0 or the array is empty. If the array is emtpy and the group count is 0 then it is a valid splitting. In addition, you have to select longer groups as well.
Challanges:
Find all valid splitting. It means your method should return with a Collection of solutions. So your input is a String[] and your return type is a String[][][] (indexes: element, group, solution). This Data Structure is hard to understand so it was wrapped in the example see ArraySlice and ArraySlices.
Recursion needs one more arguments which is the prefix. So you have to wrap the recursive function into you service. So the recursive implementaion cannot be called for outside of your implementaion.
Project was managed with Maven and tests were added for splitting the array. App.java can be run to demonstrate the application.

Java sorted data structure that allows for logarithmic time removal of values within a range

I was wondering if there's an interface in the Java built in libraries that implements a data structure which is ordered and supports removal over a range. For example, if we call the data structure S (let's say of Integers), I'd like to be able to search for and remove the subset Q of S such that Q consists of all elements in S in the range [start, end] in O(|Q| log |S|) time.
I know in C++, there is an erase method to the Set interface, but it doesn't seem like Java's TreeSet has something similar. Can anyone help? Thanks!

SortedSet.subSet returns a view, which you can then clear().
For example:
TreeSet<String> set = new TreeSet<>(Arrays.asList("A", "B", "C", "D", "E"));
System.out.println(set); // [A, B, C, D, E]
set.subSet("B", "D").clear(); // Inclusive of B, exclusive of D.
System.out.println(set); // [A, D, E]
(The documentation of SortedSet describes how to modify the bounds of subSet to be exclusive and inclusive, respectively, at least for String).

I don't know of any interfaces/libraries, but you could try using a histogram like structure...
For example, let's say we know our structure will only hold integers between min and max inclusive. Then we can simply make an array in the following manor...
int[] hist = new int[max - min + 1];
If we want to add a number i to the histogram, we can simply do...
hist[i - min]++;
Each index in the array represents a number in the predefined range. Each value in the array represents the number of occurrences of a number in the range.
This structure is extremely powerful since it allows for constant time addition, removal, and search.
Let's say we want to remove all elements in the inclusive range Q = [s, e]. We can run the following linear loop...
for (int i = s; i <= e; i++) {
hist[i - min] = 0;
}
This runs in O(|Q|).
Lastly, If we wanted to make the above structure an ordered set instead of an ordered sequence, we could change our array to hold booleans instead of integers.
For more info check out https://en.wikipedia.org/wiki/Counting_sort.

Find shape within rows of characters-Java

I came across to problem on thinking how to get the shape from the rows of characters, for example given this input:
AAAAAAAA
ABBBAABA
ABABABBB
ABBBAAAA
AAAABBAA
ABBABBAA
ABBABBAA
ABAABBAA
Task:
Calculate how many shapes are there formed by horizontally or vertically adjacent 'B' letters.
In this example there are 4 such shapes.
It might be easier to see if I remove the 'A's:
BBB B
B B BBB
BBB
BB
BB BB
BB BB
B BB
Additional task:
How many 'B' characters are in each shape?
In this example: 8, 4, 5, 8 (not in any particular order).
I am still new to Java, so I want to ask is there is any java function that can check the same occurrence that near to each other in order to count the shape appears? Hope you can give me some pointer on how to construct this algorithm.
(I have thought of taking each index of 'B' and check whether they are near to other 'B' but I get stuck)

I am still new to Java, so I want to ask is there is any java function that can check the same occurrence that near to each other in order to count the shape appears?
No, there is no built-in functionality to make this very easy.
And that's the point of the exercise.
Here are some example approaches you could use to solve this problem:
Flood fill:
Convert the input to a matrix. It could be a boolean[][] where you set true where the input is B.
Iterate over the values in the matrix, skipping false values.
When you find a true value, initiate the flood fill:
Increment the count of shapes (you found a new shape)
Recursively replace all adjacent true values with false, incrementing shapeSize count as you go
When all true neighbors (and neighbors of neighbors, and so on) are replaced with false, the shape is fully explored
Continue the iteration where you left off, until you find another true value
Graph theory: find connected components
Convert the input to an undirected graph:
The index of each character can be the vertex id
Create a connection for adjacent pairs of B in the input
Iterate over the vertices of the graph
If a vertex doesn't have yet a component id, use depth-first search to find all the vertices connected to it, assign to all vertices the next component id, incrementing the componentSize count as you go
When there are no more connected vertices, the shape is fully explored
Continue the iteration where you left off, until you find another vertex with no component id
Union find: this is similar to finding the connected components

Method indexof(int ch) returns first index of apearing character in a String. you should cut each string after getting index of first B.
Or you can use indexOf(int ch, int fromIndex) that returns the index within this string of the first occurrence of the specified character, starting the search at the specified index.

int indexCount = 0;
ArrayList<Integer> list = new ArrayList<Integer>();
for(int a = 0; a < count; a++) {
indexCount = array[a].indexOf('B');
list.add(indexCount);
}
System.out.println(list);

How to find repeated sequences of events

I'm trying to find an efficient algorithm for identifying a reoccurring sequence of characters. Let's say the sequence could be a minimum of 3 characters, yet only returns the maximum length sequence. The dataset could potentially be thousands of characters. Also, I only want to know about the sequence if it's repeated, lets say, 3 times.
As an example:
ASHEKBSHEKCSHEDSHEK
"SHEK" occurs 3 times and would be identified. "SHE" occurs 4 times, but isn't identified since "SHEK" is the maximum length sequence that contains that sequence.
Also, no "seed" sequence is fed to the algorithm, it must find them autonomously.
Thanks in advance,
j

Try to create suffix array for string.
Online builder: http://allisons.org/ll/AlgDS/Strings/Suffix/
Check the beginning of consecutive lines in suffix array to match

Looks like Rabin-Karp Wiki Entry

If you consider that there exist \sum(n) / 2 possible starting strings, and you aren't looking for simply a match, but the substring with the most matches, I think your algorithm will have a terrible theoretical complexity if it is to be correct and complete.
However, you might get some practical speed using a Trie. The algorithm would go something like this:
For each offset into the string...
1 For each length sub-string...
Insert it into the trie. Each node in the trie has a data value (an "count" integer) that you increment by when you visit the node.
Once you've built-up the trie to model your data, delete all the sub-trees from the trie with roots below some optimization threshold (3 in your case).
Those remaining paths should be few enough in number for you to efficiently sort-and-pick the ones you want.
I suggest this as a starting point because the Trie is built to manipulate common prefixes and, as a side-effect, will compress your dataset.
A personal choice I would make would be to identify the location of the sub-strings as a separate process after identifying the ones I want. Otherwise you are going to store every substring location, and that will explode your memory. Your computation is already pretty complex.
Hope this makes some sense! Good luck!

Consider the following algorithm, where:
str is the string of events
T(i) is the suffix tree for the substring str(0..i).
T(i+1) is quickly obtained from T(i), for example using this algorithm
For each character position i in the input string str, traverse a
path starting at the root of T(i), along edges, labeled with
successive characters from the input, beginning from position i + 1.
This path determines a repeating string. If the path is longer than
the previously found paths, record the new max length and the position
i + 1.
Update the suffixe tree with str [i+1] and repeat for the next position.
Something like this pseudocode:
max.len = 0
max.off = -1
T = update_suffix_tree (nil, str [0])
for i = 1 to len (str)
r = root (T)
j = i + 1
while j < len (str) and r.child (str [j]) != nil
r = r.child (str [j])
++j
if j - i - 1 > max.len
max.len = j - i - 1
max.off = i + 1
T = update_suffix_tree (T, str [i+1])
In the kth iteration, the inner while is executed for at most n -
k iterations and the suffix tree construction is O(k), hence the
loop body's complexity is O(n) and it's executed n-1 times,
therefore the whole algorithm complexity is O(n^2).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.