real life situations using palindrome algorithm - java

so, for my final project in data structure class, we are to develop algorithms for palindrome, but I sorta want to fancy it up a bit and make into a mini program, what real life situation mimic the usage of palindrome,exempt for works on strings,
thanks !

In real life, could be used for some compression algorithms.
For example there are researches about biological sequence compression algorithms, that use this property
HERE, HERE and HERE more details

Palindromes are used in DNA for marking and permitting cutting. They are used to change one dimensional chain into 2 or 3 dimensional structure.

One interesting application of Longest Palindromic Substring aka Manacher's Algorithm that came out of my head is: while playing Indian Rummy card game (aka Rummy 13), if LPS is 5, then the card occuring at the middle of that substring is a great candiadate for next selection to have two rummies. Similarly if LPS is 6, 4 etc has other similar potential.
Example: (Played with more than 1 deck) There are 13 cards out of which 5 cards are as below:
Clubs of 2, 3, 4, 3 2
Let this be the longest palindrome we got out of all 13 cards. Here Middle card is Clubs 4, which is a great candidate to be selected next.
Because, if in next round you get a Clubs 4, you could have two rummies: 234 and 234.

Palindromes are strings that read the same forwards as backwards such as:
A man, a plan, a canal, Panama!
Was it Eliot's toilet I saw?
Dennis And Edna Sinned
There aren't many real-world applications for this, and finding Palindromes is fairly specific to Strings... Even numeric palindromes operate on the digits within a String...
ie. 580085
is a numeric palindrome, but would still be found by analysing characters in a String.
However, the skills you get from learning to traverse Strings in reverse, recognise special cases such as shared middle characters, perform case insensitive comparisons and strip out non alphanumeric characters from Strings when performing comparisons are useful to all sorts of real-world applications.

Related

What Crossover Method should I use for crossing Postfix expressions in Genetic Algorithm?

I'm building a project whose main objective is to find a given number (if possible, otherwise closest possible) using 6 given numbers and main operators (+, -, *, /). Idea is to randomly generate expressions, using the numbers given and the operators, in reverse polish (postfix) notation, because I found it the easiest to generate and compute later. Those expressions are Individuals in Population of my Genetic Algorithm. Those expressions have the form of an ArrayList of Strings in Java, where Strings are both the operators and operands (the numbers given).
The main question here is, what would be the best method to crossover these individuals (postfix expressions actually)? Right now I'm thinking about crossing expressions that are made out of all the six operands that are given (and 5 operators with them). Later I'll probably also cross the expressions that would be made out of less operands (5, 4, 3, 2 and also only 1), but I guess that I should figure this out first, as the most complex case (if you think it might be a better idea to start differently, I'm open to any suggestions). So, the thing is that every expression is made from all the operands given, and also the child expression should have all the operands included, too. I understand that this requires some sort of ordered crossover (often used in problems like TSP), and I read a lot about it (for example here where multiple methods are described), but I didn't quite figure out which one would be best in my case (I'm also aware that in Genetic Algorithms there is a lot of 'trial and error' process, but I'm talking about something else here).
What I'm saying is bothering me, are operators. If I had only a list of operands, then it wouldn't be a problem to cross 2 such lists, for example taking a random subarray of half elements from 1 parent, and fill the rest with remaining elements from parent 2 keeping the order like it was. But here, if I, say, take first half of an expression from first parent expression, I would definitely have to fill the child expression with remaining operands, but what should I do with operators? Take them from parent 2 like the remaining operands (but then I would have to watch out because in order to use an operator in postfix expression, I need to have at least 1 operand more, and checking that all the time might be time consuming, or not?), or maybe I could generate random operators for the rest of the child expression (but that wouldn't be a pure crossover then, would it)?
When talking about crossover, there is also mutation, but I guess I have that worked out. I can take an expression and perform a mutation where I'll just switch 2 operands, or take an expression and randomly change 1 or more operators. For that, I have some ideas, but the crossover is what really bothers me.
So, that pretty much sums my problem. Like I said, the main question is how to crossover, but if you have any other suggestions or questions about the program (maybe easier representation of expressions - other then list of strings - which may be easier/faster to crossover, maybe something I didn't mention here, it doesn't matter, maybe even a whole new approach to the problem?), I'd love to hear them. I didn't give any code here because I don't think it's needed to answer this question, but if you think it would help, I'll definitely edit in order to solve this. One more time, main question is to answer how to crossover, this specific part of the problem (idea or pseudocode expected, although the code itself would be great, too :D), but if you think that I need to change something more, or you know some other solutions to my whole problem, feel free to say.
Thanks in advance!
There are two approaches that come to mind:
Approach #1
Encode each genome as a fixed length expression where odd indices are numbers and even indices are the operators. For mutation, you could slightly change the numbers and/or change the operators.
Pros:
Very simple to code
Cons:
Would have to create an infix parser
Fixed length expressions
Approach #2
Encode each genome as a syntax tree. For instance, 4 + 3 / 2 - 1 is equivalent to Add(4, Subtract(Divide(3, 2), 1)) which looks like:
_____+_____
| |
4 ____-____
| |
__/__ 1
| |
3 2
Then when crossing over, pick a random node from each tree and swap them. For mutation, you could add, remove, and/or modify random nodes.
Pros:
Might find better results
Variable length expressions
Cons:
Adds time complexity
Adds programming complexity
Here is an example of the second approach:
Source

8 puzzle: Solvability and shortest solution

I have built a 8 puzzle solver using Breadth First Search. I would now want to modify the code to use heuristics. I would be grateful if someone could answer the following two questions:
Solvability
How do we decide whether an 8 puzzle is solvable ? (given a starting state and a goal state )
This is what Wikipedia says:
The invariant is the parity of the permutation of all 16 squares plus
the parity of the taxicab distance (number of rows plus number of
columns) of the empty square from the lower right corner.
Unfortunately, I couldn't understand what that meant. It was a bit complicated to understand. Can someone explain it in a simpler language?
Shortest Solution
Given a heuristic, is it guaranteed to give the shortest solution using the A* algorithm? To be more specific, will the first node in the open list always have a depth ( or the number of movements made so fat ) which is the minimum of the depths of all the nodes present in the open list?
Should the heuristic satisfy some condition for the above statement to be true?
Edit : How is it that an admissible heuristic will always provide the optimal solution? And how do we test whether a heuristic is admissible?
I would be using the heuristics listed here
Manhattan Distance
Linear Conflict
Pattern Database
Misplaced Tiles
Nilsson's Sequence Score
N-MaxSwap X-Y
Tiles out of row and column
For clarification from Eyal Schneider :
I'll refer only to the solvability issue. Some background in permutations is needed.
A permutation is a reordering of an ordered set. For example, 2134 is a reordering of the list 1234, where 1 and 2 swap places. A permutation has a parity property; it refers to the parity of the number of inversions. For example, in the following permutation you can see that exactly 3 inversions exist (23,24,34):
1234
1432
That means that the permutation has an odd parity. The following permutation has an even parity (12, 34):
1234
2143
Naturally, the identity permutation (which keeps the items order) has an even parity.
Any state in the 15 puzzle (or 8 puzzle) can be regarded as a permutation of the final state, if we look at it as a concatenation of the rows, starting from the first row. Note that every legal move changes the parity of the permutation (because we swap two elements, and the number of inversions involving items in between them must be even). Therefore, if you know that the empty square has to travel an even number of steps to reach its final state, then the permutation must also be even. Otherwise, you'll end with an odd permutation of the final state, which is necessarily different from it. Same with odd number of steps for the empty square.
According to the Wikipedia link you provided, the criteria above is sufficient and necessary for a given puzzle to be solvable.
The A* algorithm is guaranteed to find the (one if there are more than one equal short ones) shortest solution, if your heuristic always underestimates the real costs (In your case the real number of needed moves to the solution).
But on the fly I cannot come up with a good heuristic for your problem. That needs some thinking to find such a heuristic.
The real art using A* is to find a heuristic that always underestimates the real costs but as little as possible to speed up the search.
First ideas for such a heuristic:
A quite pad but valid heuristic that popped up in my mind is the manhatten distance of the empty filed to its final destination.
The sum of the manhatten distance of each field to its final destination divided by the maximal number of fields that can change position within one move. (I think this is quite a good heuristic)
For anyone coming along, I will attempt to explain how the OP got the value pairs as well as how he determines the highlighted ones i.e. inversions as it took me several hours to figure it out. First the pairs.
First take the goal state and imagine it as a 1D array(A for example)
[1,2,3,8,0,4,7,5]. Each value in that array has it's own column in the table(going all the way down, which is the first value of the pair.)
Then move over 1 value to the right in the array(i + 1) and go all the way down again, second pair value. for example(State A): the first column, second value will start [2,3,8,0,4,7,5] going down. the second column, will start [3,8,0,4,7,5] etc..
okay now for the inversions. for each of the 2 pair values, find their INDEX location in the start state. if the left INDEX > right INDEX then it's an inversion(highlighted). first four pairs of state A are: (1,2),(1,3),(1,8),(1,0)
1 is at Index 3
2 is at Index 0
3 > 0 so inversion.
1 is 3
3 is 2
3 > 2 so inversion
1 is 3
8 is 1
3 > 2 so inversion
1 is 3
0 is 7
3 < 7 so No inversion
Do this for each pairs and tally up the total inversions.
If both even or both odd (Manhattan distance of blank spot And total inversions)
then it's solvable. Hope this helps!

Getting/Applying capitlization mask before/after encoding?

My project takes a String s and passes an all lower case version s.toLowerCase() to a lossless encoder.
I can convert encode/decode the lower case string just fine, but this obviously would not be practical, so I need to be able to preserve the original String's capitalization somehow.
I was thinking of using Character.isUpperCase() to get an array of integers UpperCaseLetters[] that represents the locations of all capital letters in s. I would then use this array to place a ^ at all locations UpperCaseLettes[i] + 1 in the encoded string. When decoding the string, I would know that every character preceding a ^ is capital. (By the way, for this encoder will never generate ^ when encoding).
This method seems sloppy to me though. I was also thinking of using bit strings to represent capitalization, but the over all goal of the application is compression, so that would not be very efficient.
Is there any easier way to get and apply capitlization masks for strings? If there is, how much "storage" would it need?
Your options:
Auto-capitalize:
Use a general algorithm for capitalization, use one of the below techniques to only record the letters that differ between the generated and the actual capitalization. To regenerate, just run the algorithm again and flip the capitalization of all the recorded letters. Assuming there are capital letters where there should be (e.g. start of sentences), this will slow the algorithm down slightly (only by a small constant factor of n, and decent compression is generally much slower than that) and always reduce the amount of storage space required by a few.
Bitmap of capital positions:
You've already covered this one, not particularly efficient.
Prefix capitals with identifying character:
Also already covered, except that you described postfix, but prefix is generally better and, for a more generic solution, you can also escape the ^ with ^^. Not a bad idea. Depending on the compression, it might be a good idea to instead use a letter that already appears in the dataset. Either the most or least common letter, or you may have to look at the compression algorithm and do quite a bit of processing to determine the ideal letter to use.
Store distance of capital from start in any format:
Has no advantage over distance to next capital (below).
Distance to next capital - non-bitstring representation:
Generally less efficient than using bitstrings.
Bit string = distance to next capital:
You have a sequence of lengths, each indicating, in sequence, the distances between capitals. So if we have distances 0,3,1,0,5 capitalization would be as follows: AbcdEfGHijklmNo (skip 0 characters to the first, 3 character to the second, 1 character to the 3rd, etc.). There are some options available to store this:
Fixed length: Not a good idea since it needs to be = the longest possible distance. An obvious alternative is having some sort of overflow into the next length, but this still uses too much space.
Fixed length, different settings: Best explained with an example - the first 4 bits indicate the length, 00 means there are 2-bits following to indicate the distance, 01 means 4-bits, 10 means 8-bits, 11 means 16-bits, if there's a chance of more than 16-bits, you may want to do something like - 110 means 16-bits, 1110 means 32-bits, 11110 means 64-bits, etc. (this may sound similar to determining the class of a IPv4 address). So 0001010100 would split into 00-01, 01-0100, thus distances 1, 4. Note that the lengths don't have to increment in powers of 2. 16-bits = 65535 characters is a lot and 2-bits = 3 is very little, you can probably make it 4, 6, 8, (16?), (32?), ??? (unless there are a few capitals in a row, then you probably want 2-bits as well).
Variable length using escape sequence: Say the escape sequence is 00, we want to use all strings that doesn't contain 00, so the bit value table will look as follows:
Bits Value
1 1
10 2
11 3
101 4 // skipped 100
110 5
111 6
1010 7 // skipped 1000 and 1001
10100101010010101000101000010 will split into 101, 10101, 101010, 101, 0, 10. Note that ...1001.. just causes a split ending at the left 1 and a split starting at the right 1, and ...10001... causes a split ending at the first 0 and a split starting at the right 1, and ...100001... indicates a 0-valued distance in between. The pseudo-code is something like:
if (current value == 1 && zeroCount < 2)
add to current split
zeroCount = 0
else if (current value == 1) // after 00...
if (zeroCount % 2 == 1) { add zero to current split; zeroCount--; }
record current split, clear current split
while (zeroCount > 2) { record 0-distance split; zeroCount -= 2; }
else zeroCount++
This looks like a good solution for short distances, but once the distances become large I suspect you start skipping too many values and the length increases to quickly.
There is no ideal solution, it greatly depends on the data, you'll have to play around with prefixing capitals and different options for bit string distances to see which is best for your typical dataset.

Efficiency between lists and methods

I was thinking of making a Sudoku solver, I have 2 questions:
1) What would be faster?
A) Go through all the empty spots, have a list of numbers (1-9) remove them if it is in same line, or same category, then if it is length 1, add the only one remaining. Repeat this while needed.
B) Go through all the numbers, then check all the spots to see if they can have that number. Repeat this while needed.
2) What is the most efficient List for housing a list under 9 in length?
Thanks,
Legend
Answer 2) Not a list but a set would make sense. In this case BitSet.
Case 1) There are 27 rules in a 9x9 sudoku.
Case 1A) Every spot participates in 3 rules.
Case 1B) Every number is 9 times repeated; appears in 3 rules.
Answer 1) 1A and 1B should theoretical not be different, but 1A seems to make an algorithm & data structure easier.
I think B works! You can use a backtracking algorithm to check the empty spot with any of the 1-9 numbers(in order). Fill the spot with first available choice(1-9) and move ahead. If at any point you are unable to insert a number into a slot then backtrack to the previous slot and try a different number.
This might be helpful :
http://edwinchan.wordpress.com/2006/01/08/sudoku-solver-in-c-using-backtracking/

Best data structure to store and manipulate my data?

I am writing a simple Java program that will input a text file which will have some numbers representing a (n x n) matrix where numbers are separated by spaces. for ex:
1 2 3 4
5 6 7 8
9 1 2 3
4 5 6 7
I then want to store these numbers in a data structure that I will then use to manipulate the data (which will include, comparing adjecent numbers and also deleting certain numbers based on specific rules.
If a number is deleted, all the other numbers above it fall down the amount of spaces.
For the example above, if say i delete 8 and 9, then the result would be:
() 2 3 ()
1 6 7 4
5 1 2 3
4 5 6 7
so the numbers fall down in their columns.
And lastly, the matrix given will always be square (so always n x n, where n will be always given and will always be positive), therefore, the data structure has to be flexible to virtually accept any n-value.
I was originally implementing it in a 2-d array, but I was wandering if someone had an idea of a better data structure that I could use in order to improve efficiency (something that will allow me to more quickly access all the adjacent numbers in the matrix (rows and columns).
Ultimately, mu program will automatically check adjacent numbers against the rules, I delete numbers, re-format the matrix, and keep going, and in the end i want to be able to create an AI that will remove as many numbers from the matrix as possible in the least amount of moves as possible, for any n x n matrix.
In my opinion, you yo know the length of your array when you start, you are better off using an array. A simple dataType will be easier to navigate (direct access). Then again, using LinkedLists, you will be able to remove a middle value without having to re-arrange the data inside you matrix. This will leave you "top" value as null. in your example :
null 2 3 null
1 6 7 4
5 1 2 3
4 5 6 7
Hope this helps.
You could use one dimensional array with the size n*n.
int []myMatrix = new myMatrix[n * n];
To access element with coordinates (i,j) use myMatrix[i + j * n]. To fall elements use System.arraycopy to move lines.
Use special value (e.g. Integer.MIN_VALUE) as a mark for the () hole.
I expect it would be fastest and most memory efficient solution.
Array access is pretty fast. Accessing adjacent elements is easy, as you just increment the relevant index(s) (being cognizant of boundaries). You could write methods to encapsulate those operations that are well tested. Having elements 'fall down' though might get complicated, but shouldn't be too bad if you modularize it out by writing well tested methods.
All that said, if you don't need the absolute best speed, there are other options.
You also might want to consider a modified circularly linked list. When implementing a sudoku solver, I used the structure outlined here. Looking at the image, you will see that this will allow you to modify your 2d array as you want, since all you need to do is move pointers around.
I'll post a screen shot of relevant picture describing the datastructure here, although I would appreciate it if someone will warn me if I am violating some sort of copy right or other rights of the author, in which case I'll take it down...
Try a Array of LinkedLists.
If you want the numbers to auto-fall, I suggest you to use list for the coloumns.

Categories

Resources