All Possible Combinations and their Substitutions given a string - java

I am working on a problem:
Given a password represented as a string and a character map that contains common characters and substitutions, create a list of all possible password combinations that can be created.
Map: {a=#, s=$, E=3, i=!, o=0}
Password String: "password"
Possible outcomes
p#ssword
p#$sword
pa$sword
p#s$word
p#$$word
pa$$word
pas$word
p#ssw0rd
p#$sw0rd
pa$sw0rd
p#s$w0rd
p#$$w0rd
pa$$w0rd
pas$w0rd
passw0rd
My logic so far:
Replace everything except the index you are on. This solution gives the most results but not complete result. It misses about 5 combinations.
My Code so far: it doesnt work and i havent implemented the solution above.
for(int i = 0;i < arr.length;i++){
for(int j =i; j < arr.length ; j++){
if(map.containsKey(arr[j])){
arr[j] = map.get(arr[j]);
}
//list.add(arr);
}
list.add(arr);
}

Think of the password like a binary number as an incrementing counter:
0000
0001
0010
0011
0100...
Each bit can be 'on' or 'off'. The "bits" of your number are the substitutable letters, and in this case it makes more sense to do the incrementing from left to right.
At any given time, you're either checking a bit, or "carrying" an increment to the next bit. The letters that don't substitute always "carry" automatically. The letters that do substitute increment to the next "value" (you could do this with more than one substitution, like i = {i, I, l, 1}). If you would carry past the "highest value", reset the "bit" to the initial value and "carry" to the next bit that would change. If that bit also carries, keep moving to the right, but if you change a bit, return to the first bit and continue from there.
If the process would go past the end of the word, it would reset the entire value to the initial one so you don't need to print that one and you can terminate the algorithm.
EDIT (by asker request): Let's use your example to illustrate.
In your example, you have two values for five different 'common' characters: 'a', 'e', 'i', 'o', and 's'. Think of each of these as set of ordered lists of 'changeable' characters, such as { <'a', '#'>, <'e', '3'>, ... } and so on. If a letter doesn't have any alternatives, pretend it's in the set too, but it's just a list with a single character in it, like <'p'>. The reason the list is ordered is so that you can go through it systematically, which allows you to know when you've tried all the values. The order doesn't matter, but if it helps you can imagine the "base" character is the common version.
One way to hold that information would be with a map from each possible character (like 'a', '#', 'b', and so on) to a short array that lists the ones you want to interchange. If there's only one version of a character, you don't need to add it. So in the mapping, both 'a' and '#' would be keys for the same sequence of characters <'a', '#'>, and 'p' wouldn't need to be defined at all.
The first string of characters is "password", which you also store as the first word in your set of possible passwords. You also start with an index pointing to the first character in the string.
The algorithm does this. Look at the current index. Check if the value is in the map, and is not the last value in the list the mapping points to: e.g., if the list you find in the map for 'a' or '#' is <'a', '#'>, then 'a' is not the last value, but '#' is the last value.
If it's in the map but it isn't the last value in its list, change the character to the next value in that list: so, 'a' would become '#'. Then set the counter back to zero (even if it's already zero) and continue the loop. Also, every time you return to index zero, you must store the modified string in your set of passwords because it is now a new possible password.
If it IS at the last value (or if it's not in the map, in which case it's always at the last value because there's only one version), you have to carry instead. Change the character back to the first value in the list (so '#' becomes 'a' again, and 'p' doesn't change), and increment the index. Don't add the current word to the set of passwords, though! You aren't done changing it until it returns to index 0, and it's currently the same as a password you've already added.
If the index would go past the end of the string, you've collected all possible passwords (and the string should now be the same as it was initially) so you can stop.
So in your example, we start by looking at index and see that index 0 is not past the end of the string. Then we look at 'p' and it isn't found in the map, so we 'carry': increment the index and repeat the loop. Now we're at index 1 and see 'a', which is in the map, pointing to <'a', '#'>. Since there's at least one character after 'a' in that list, we change 'a' to the next character '#'. Making any change other than a carry means we store this new password, and return to index 0 to repeat the loop. (The next time we see index 1, we'll change it back to 'a', but not store the password because we need to 'carry' and continue walking through the string until we find something new to change or reach the end.)
(Note that I simplified this with the assumption that you always start testing with the "least" character in the list, so you'd test "password" as an initial string, not "p#s$word". It's not too difficult to modify it to work in the other cases, but may be more efficient to just do a first pass through the test string and set it to all 'base' characters, then take that as the initial password.)

Related

Comparing values difficulty with counter

I am reading a text and calculating how many times this value has occurred in the text. To do this I am using an ArrayList, whenever more than one character of the same type is added which is already in my ArrayList....I increment a counter. So at the end of the method I'm able to print the letters of the alphabet contained within the text matching with it's corresponding occurrence.
for(int i; i < text.length i++)
counter = 0
if arraylist already contains the character then continue
otherwise add the character to the array
for j; j < text.length j++
if index of text(j) and text(i) == the same
counter++
system out print arraylist[i] + counter
This is pseudo code to give you an idea of how my program works, I don't want to post the actual code up as it is assessed and I'm conscious about people using it.
So, I'm looking for a way to identify how to find the highest and lowest letters which have occurred. I'm struggling for ideas unless I pass on both the counter and index of array list character to some sort of data structure such as a hashmap =/ I feel like I must really be overthinking it though, unless the way I've structured my program isn't the best for what I'm trying to do. Because obviously I can't compare the counters each loop? .... questioning whether having a hashmap may be better and worth restarting everything.
Anyway, any suggestions welcome! ( this is assessed so please don't give an answer, but more of a possibility for how it could be approached )
Try using a hashmap, i.e.
HashMap<Character, Integer> charMap;
Where the Integer is the count you would like to keep track of. Populate your hashmap with the appropriate characters. After, you can simply get the character by the get("somechar") method and increase the integer by 1.
After you're done iterating through the characters, you can iterate through the hashmap to determine the character with the lowest/highest frequency.

Comparing sets of randomly assigned codes in Java to assign a name

Good day,
I honestly do not know how to phrase the problem in the title, thus the generic description. Actually I have a set of ~150 codes, which are combined to produce a single string, like this "a_b_c_d". Valid combinations contain 1-4 code combinations plus the '-' character if no value is assigned, and each code is only used once( "a_a..." is not considered valid). These sets of codes are then assigned to a unique name. Not all combinations are logical, but if a combination is valid then the order of the codes does not matter (if "f_g_e_-" is valid, then "e_g_f_-","e_f_-_ g_" is valid, and they all have the same name). I have taken the time and assigned each valid combination to its unique name and tried to create a single parser to read these values and produce the name.
I think the problem is apparent. Since the order does not matter, I have to check for every possible combination. The codes cannot be strictly orderd, since there are some codes who have meaning in any position.So, this is impossible to accomplish with a simple parser. Is there an optimal way to do this, or will I have to force the user to use some kind of order against the standard?
Try using TreehMap to store the code (string) and and its count (int). increment the count for the code every time it is encountered in the string.
After processing the whole string if you find the count for any code > 1 then string has repeated codes and is invalid, else valid.
Traversing TreeMap will be sorted based on key value. Traverse the TreeMap to generate code sequence that will be sorted.

Java Algorithm String-ID-generation hierarchical parent child

What is a good algorithm to generate unique IDs to be put into a Map<String, Entity> with Entity being a container/folder class that can contain other Entities and String being the ID? I think when generating a new Entity it should always use the ID of its parent, so right now what I do is
(Math.abs((parentName+entityName).hashCode())).toString;
But it seems pretty inefficient as the ID can be a String but may not contain "-", so it contains only numbers when it may as well contain letters and Math.abs halves the number of possible IDs. Oh, and the ID has to be of same length (8 letters). It has only to function as a key in the map and inside an XML-file and does not have to be secure.
There doesn't seem to be any advantage to including the parent id in the child id. A potential advantage of this would be to find all children by their parent id (i.e. return all ids that start with parent_id), but you're hashing the concatenated id and you have a max id length which makes this approach infeasible.
If your keys don't have to be secure then a counter would be efficient and guarantee uniqueness. A sample implementation would be to generate ids composed of case-sensitive alphanumerics, which would give you about 10^14 ids (you can also add special characters to increase the number of ids). You'll need an array of 62 characters: indices 0-25 have lowercase letters, indices 26-51 have uppercase letters, and indices 52-61 have numbers. You'll also need a state array of 8 integers (or shorts or bytes), initialized to all 0's. To retrieve an id, use the state array to look up characters in the character array and concatenate them together (so a state of {0, 1, 2, 0, 1, 2, 0, 1} generates an id of "abcabcab"); then increment the 0th index of the state array, if this results in a number greater than 61 then set the 0th index to 0 and increment the 1st index of the state array, if this results in a number greater than 61 then set the 1st index to 0 and increment the 2nd index of the state array, etc.
I suggest that you use a StringBuilder to concatenate the substrings, otherwise you're going to generate a lot of garbage strings. You might also be able to replace the state array with a StringBuilder, using StringBuilder#replace in place of the int/short/byte increment operations.
If your application is multi-threaded then the counter can become a bottleneck. One way to fix this is for each worker thread to reserve either 62 or 62^2 ids, for example: ID_Thread is the thread with the id generator, and its getBatchId method is synchronized and returns a copy of the state array. ID_Thread increments the 2nd index of the state array (not the 0th index), if this results in a number greater than 61 then it sets the 2nd index to 0 and increments the 3rd index, etc. Meanwhile, Worker_Thread has called getBatchId and now has a copy of a state array; it uses this to generate ids, after which it increments the 0th index of the state array, if this results in a number greater than 61 then it sets the 0th index to 0 and increments the 1st index, and if this results in a number greater than 61 then it calls getBatchId for a new state array. This means that the Worker_Thread instances only need to call a synchronized method for one out of every 62^2 ids.
An alternative multi-threaded implementation would be for Id_Thread to continually generate ids and place them in a BlockingQueue (with maximum queue size of, say, 32), with the Worker_Thread instances pulling ids from this queue.

Order a list of characters, given a dictionary

I was asked this question in an interview. Suppose you have an ordered dictionary, and are given a list of unordered characters- how would you order these characters by precedence? This dictionary contains words where all the 26 characters are guaranteed to appear. However, note that the size of the dictionary might be anything. The dictionary could be as small as a few words and may not have separate sections for each character e.g., there might be no sections for words beginning with a; although a will appear as part of another word e.g., "bat".
The dictionary might be "ordered" (/sarcasm) as such "zebra', "apple", "cat", "crass", and if you're given the list {a, z, r}, the correct order would be {z, a, r}. Since "zebra" is before "apple" in the dictionary, we know z comes before a in the presedence. Since "apple" comes before "cat", we know a comes before c. Since "cat" comes before "crass", we know that a comes before r. This ordering leaves c and r with ambugious presendece, but since the list of letters was {a, z, r}, we know the solution to be {z, a, r}.
Use a directed graph with 26 vertices, each vertex represents a character. An edge from vertex A to vertex B means in the alphabet B is in front of A.
The first step is to establish such a graph with only vertices but NO edges.
Second, you scan the input dictionary, word by word. And compare each word with the previous word. You should find exact one relationship for each word you scanned. So you add an edge in this graph. Assume the dictionary is correct, there should be no conflicts.
After you finished the dictionary, you output the alphabet by
pick a random vertex, traverse its path until you find the one character that points to nothing. This is the first character in the alphabet. Output it and delete it from the graph.
keep doing 1 until all vertices are deleted.
EDIT:
To better explain this algorithm, let's run it on your sample input.
Input: {"zebra', "apple", "cat", "crass"}
Word 0 and word 1, we immediately know that z comes before a, so we make an edge a->z
Word 1 and word 2, we immediately know that a comes before c, so we make another edge c->a
Word 2 and Word 3, the first letters are the same "c", but the second ones differ, so we learn that a comes before r, so we have another edge r->a
Now all the words are read. Output the order by pick up a vertex randomly (say we pick c), then we have c->a->z in the graph. Output z and delete z from the graph (mark it as NULL). Now pick another one (say we pick r), then we find r->a in the graph. We output a and delete a from graph. Now we pick another one (say we pick c again), there's no path found, so we just output c and delete it. Now we pick the last one, r, there's no path again, so we output r and delete it. Since all vertices are deleted, the algorithm is done.
The output is z, a, c, r. The ordering of "c" and "r" are random since we don't really know their relationship from the input.
From the fact that "zebra' < "apple" < "cat" < "crass", the most efficient way to derive the per-character relationships is to have a loop consider the Nth character of all words, where N is initially 0 yielding the relationships "z" < "a" < "c". That loop can recursively extract relationships for the (N + 1)th character for groups of words with the same prefix (i.e. text in positions <= N). Doing that for N == 1 with same-prefixed "cat" and "crass" yields the relationship "a" < "r".
We can represent known relationships in a 2 dimensional array of x < y truth values.
y\x a b c...r...z
a - N N Y
b -
c Y - Y
r Y -
z N N -
The brute force approach is to iterate over all pairs of characters in the input list (i.e. {a, z, r} -> az, ar, zr) looking up the table for a<z, a<r, z<r: if this is ever false, then swap the characters and restart the whole she-bang. When you make it through the full process without having had to swap any more characters, the output is sorted consistently with the rules. This is a bit like doing a bubble sort.
To make this faster, we can be more proactive about populating cells in our table for implied relationships: for example, we know "z" < "a" < "c" and "a" < "r", so we deduce that "z" < "r". We could do this by running through the "naive" table above to find everything we know about each character (e.g. that z<a and z<c) - then run through what we know about a and c. To avoid excessively deep trees, you could just follow one level of indirection like this, then repeat until the table was stable.
Based on how you describe the problem, your example is incorrect. Your answer should be {z,r,a}. However that may be, below is a code that solves the problem. You can modify it to return an order different from my supposed {z,r,a}.
Set<Character> charPrecedence(List<String> dictionary, char[] letters){
Set<Character> result = new HashSet<Character>();
//since your characters are the 26 letters instead of the 256 chars
// a bit vector won't do; you need a map or set
Set<Character> alphabets = new HashSet<Character>();
for(char c: letters)
alphabets.add(c);
//now get to work
for(String word: dictionary){
if(alphabets.isEmpty()) return result;
for(char c: word.toCharArray()){
if(alphabets.remove(c))
result.add(c);
}
}
//since the dictionary is guaranteed to contain all the 26 letters,
//per the problem statement, then at this point your work is done.
return result;
}
best case O(1); worst case O(n) where n is the number of characters in the dictionary, i.e., one particular letter appears only once and is the last character you check.

How to find repeated sequences of events

I'm trying to find an efficient algorithm for identifying a reoccurring sequence of characters. Let's say the sequence could be a minimum of 3 characters, yet only returns the maximum length sequence. The dataset could potentially be thousands of characters. Also, I only want to know about the sequence if it's repeated, lets say, 3 times.
As an example:
ASHEKBSHEKCSHEDSHEK
"SHEK" occurs 3 times and would be identified. "SHE" occurs 4 times, but isn't identified since "SHEK" is the maximum length sequence that contains that sequence.
Also, no "seed" sequence is fed to the algorithm, it must find them autonomously.
Thanks in advance,
j
Try to create suffix array for string.
Online builder: http://allisons.org/ll/AlgDS/Strings/Suffix/
Check the beginning of consecutive lines in suffix array to match
Looks like Rabin-Karp Wiki Entry
If you consider that there exist \sum(n) / 2 possible starting strings, and you aren't looking for simply a match, but the substring with the most matches, I think your algorithm will have a terrible theoretical complexity if it is to be correct and complete.
However, you might get some practical speed using a Trie. The algorithm would go something like this:
For each offset into the string...
1 For each length sub-string...
Insert it into the trie. Each node in the trie has a data value (an "count" integer) that you increment by when you visit the node.
Once you've built-up the trie to model your data, delete all the sub-trees from the trie with roots below some optimization threshold (3 in your case).
Those remaining paths should be few enough in number for you to efficiently sort-and-pick the ones you want.
I suggest this as a starting point because the Trie is built to manipulate common prefixes and, as a side-effect, will compress your dataset.
A personal choice I would make would be to identify the location of the sub-strings as a separate process after identifying the ones I want. Otherwise you are going to store every substring location, and that will explode your memory. Your computation is already pretty complex.
Hope this makes some sense! Good luck!
Consider the following algorithm, where:
str is the string of events
T(i) is the suffix tree for the substring str(0..i).
T(i+1) is quickly obtained from T(i), for example using this algorithm
For each character position i in the input string str, traverse a
path starting at the root of T(i), along edges, labeled with
successive characters from the input, beginning from position i + 1.
This path determines a repeating string. If the path is longer than
the previously found paths, record the new max length and the position
i + 1.
Update the suffixe tree with str [i+1] and repeat for the next position.
Something like this pseudocode:
max.len = 0
max.off = -1
T = update_suffix_tree (nil, str [0])
for i = 1 to len (str)
r = root (T)
j = i + 1
while j < len (str) and r.child (str [j]) != nil
r = r.child (str [j])
++j
if j - i - 1 > max.len
max.len = j - i - 1
max.off = i + 1
T = update_suffix_tree (T, str [i+1])
In the kth iteration, the inner while is executed for at most n -
k iterations and the suffix tree construction is O(k), hence the
loop body's complexity is O(n) and it's executed n-1 times,
therefore the whole algorithm complexity is O(n^2).

Categories

Resources