Implementing A Nondeterminisic Finite Automaton(NFA)

Implementing A Nondeterminisic Finite Automaton(NFA) - java

I'm trying to a develop a simulation that executes a non deterministic finite automaton in Java. The first command line argument is a text file that defines the machine. The second argument is an input string. If it accepts the string, it prints to standard output "accept" followed by a list of accept states it can end in. If it rejects, it outputs "reject" followed by a list of all possible end states.
For example, the text:
state 1 start
state 2
state 7 accept
transition 1 0 1
transition 1 1 1
transition 1 0 2
transition 2 0 2
transition 2 0 1
transition 2 1 1
transition 2 0 7
transition 7 0 7
transition 7 1 7
which looks like:
with an input string of 10 would output
reject 1 2
and with an input string of 000 would output
accept 7
I need advice on what data structures to use. I thought about using hashmaps and sets but I'm not sure.

I think you should transform your automaton into a deterministic one and then do the obvious.
There are three common ways of implementing a finite state machine in software:
Represent the transition function as a table (2D array) and upon each character read, look up the next state in it.
Use nested switch statements for the current state and input character to explicitly define the next state in code.
Use the State Pattern: Represent each state as a class with a method returning the next state, given an input character.
Since you'll need to build your automaton at run-time, the last two options don't really apply, so you should go with the lookup table. If your alphabet is small, you can write down all transitions. If the alphabet is huge, this might waste a lot of space and you can think about table compression which used to be an active field of research.
For the Downvoters: You cannot write a deterministic program that “executes” a non-deterministic automaton. However, by a rather fundamental theorem of theoretical computer science, you can transform any non-deterministic finite state automaton into a deterministic one by a fully automated procedure, that is, a deterministic algorithm that can easily be implemented in software. Every computer science student should have done this procedure at least once using pencil and paper. If you do so, you'll quickly see how to keep track of which states of the original automaton went into which states of the deterministic one. Then, simply simulate the latter one, see in what state it ends up and output the states of the original non-deterministic automaton that constitute it.

NFA means you can have set of states at a time. So to represent current state of NFA use Set interface.
Set interface guarantees that there won't be any duplicates of state in . This is important as NFA has more than one state at a time. If you use any other data set this gurrentee is not there.
In case of NFA chance of having duplicate state in each transition is exponential. But set of state is always finite. Set interface guarantees that your current set will be filled with duplicates.
For space and performance you can use Enumset as Enumset use bit vectors to store state internally.
Algorithm:
initialise to start state
Process string from right to left starting from index 0;
for character at update the using the state transition;
If for any of this transition leads to final state which means that string is accepted.

Related

Backtracking search for constraint satisfaction problem implementation

For a homework assignment, my goal is to have a backtracking search using minimum remaining values, a degree heuristic, and forward checking. Needs to solve a Boolean satisfiability problem consisting of sets of 3 Boolean variables or'd with each other, and each set must evaluate to true. As my current implementation stands, I believe it will eventually solve it, but it takes so long to finish that I end up with a java heap out of memory error.
With that out of the way, my implementation is as follows:
Have an arrayList of each constraint
an array of values, with the index being the value for each variable that's either true, false, or not yet given a value, initially all not given a value. The 0th element is used as a flag.
an array for the domain of each variable: either true or false, only true, only false or no possible value, initially true or false
An arrayList of arrays, each array being a list of values; used to avoid trying the same thing twice
An array for the number of times a variable is in a constraint: this is the degree heuristic
The back search returns an array of values, and takes in an array of values and an array of domains
First checks each constraint in the list to make sure the domain for each variable works to make it true. If none of them work, set 0th flag and exit. Additionally, if 2 variables have a domain that makes the the third variable have to be true in order to make the statement work, changes the domain of that variable.
After it passes that step, it checks if each variable in the vals array has an assigned value, ending the program on a success.
Next it makes a temporary array to keep track of values it's tried, and begins a loop. Inside, it adds each variable that has a domain of the smallest domain found (MRV) and doesn't have a value/hasn't been tried, adding it to the list, and exiting the recursion if it can't find any. It then selects from the variables the one that appears the most based on the degree heuristic array. On a tie, it picks the last one that appears in the tie, and sets a flag so we don't try that variable again in the same recursion.
Still inside the loop, it first tries to set the domain and value of that variable as true if the domain isn't only false, and false otherwise. It checks to see if that value combination for the variables has been done before, and if it has it reselects. If it hasn't, it adds that value combination to the list and does a recursive step. If that recursive step returns a stop flag, swap back the values to what they were for the domain and value of selected variable, and tries again, this time making the domain and value false, but first it adds it to the list of tried combinations, reselecting if its already there, and resets the domains of all variables that don't have a value yet, then does the recursive step. If that also fails, resets the values of the variable selected domain and value and tries to select a different variable in the same loop. The loop breaks once it's complete or has failed for all combinations, and then returns the values array.
I can tell that it's not repeating itself but I don't know how I can fix/speed up my implementation so that it runs at a reasonable time.

The first step is to create a knowledge model for the domain. This can be done with a domain specific language (DSL). In the DSL syntax a possible solution to the problem is formulated. The wanted side effect of a domain specific language is, that a grammar has to be created which can parse the language. This grammar can be used in the reverse order which is called a "generative grammar". The aim is to include as much domain knowledge as possible in the DSL which makes it easier for the solver to test out all states.
Unfortunately, the question has only a few information about the domain itself. According to the description there are three variables which can be on or off. This would generate a possible statespace of 2x2x2=8 which seems a bit too easy for a domain, because the solver is done if he tested out all 8 states. So i would guess, that the problem is bit harder but not explained in the description. Nevertheless, the first step is to convert the problem into a language which can be parsed by a grammar.

How to efficiently store Roll Playing Item properties in Java?

How to efficiently store RPG Item properties in Java?
I'm designing a text based RPG where random items are spawned in when looting chests for example.
Currently I'm storing my Item data in a textfile. When a random item has to be generated a scanner reads the file, collects the values, takes probability in account, and then creates a new object accordingly.
Here is an example of some items of the 'Consumable' class.
The values are listed accordingly:
name, probability, level at which the item gets added to the itempool, weight, value, +health, +damage, Effect
example of textfile:
Twinkie 10 1 1 10 10 10 0
Banana 10 1 1 5 5 0 0
Potato 20 1 1 5 5 0 0
Protein_Shake 5 5 1 30 10 10 1
Beer 5 5 1 5 10 10 1
If the Effect value equals 0, a new default Consumable gets created with effect 'null'.
if the Effect value is 1, a function uses a switch(name) statement to find the effect belonging to the item and then
passes the effect as an argument in the 'Consumable constructor'.
I'm certain that this is a very unoriganized and inefficient way to go about this sort of thing. Anyone has a suggestion on how to handle this? I want to do it right.
Should I maybe create an ConsumablePool class where I just create all the items immediatly or store the item information elsewhere?

There are a few ways to optimize your current task, which seems to be:
Consumable nextItem = getRandomConsumable(List<Consumable> candidates);
Perhaps you must also choose based on the current level:
Consumable nextItem = getRandomConsumable(int level, List<Consumable> candidates);
Currently, you state that each time you call getRandomConsumable() you generate the list of candidates from a disk file (you seem to have already written this code). Your instinct is correct. The 'cost' of this operation is relatively high. Reading from disk compared to reading from a memory cache of objects will cause poor performance. Assuming that the disk file does not change during the game, your application should be creating the candidate list once (at startup), and using this List each time the next item must be chosen.
In the case where candidates are based on level, further optimizations can be done. Consider dividing the candidates by level, such as:
List<Consumable> candidates = createCacheFromDisk(); // Run at startup
Map<Integer, List<Consumable>> itemsByLevelMap = candidates .stream().collect(Collectors.groupingBy(Consumable::getLevel));
This will further break out the list of candidates into Lists by level, allowing
List<Consumable> level1Items = itemsByLevelMap.get(1);
This general approach of caching will greatly improve the performance of your application; choosing a random item from a List which has already been generated/cached performs far greater than generating a new List (from disk) each time.

Encog Neural Net - How to structure training data?

Every example I've seen for Encog neural nets has involved XOR or something very simple. I have around 10,000 sentences and each word in the sentence has some type of tag. The input layer needs to take 2 inputs, the previous word and the current word. If there is no previous word, then the 1st input is not activated at all. I need to go through each sentence like this. Each word is contingent on the previous word, so I can't just have an array that looks similar to the XOR example. Furthermore, I don't really want to load all the words from 10,000+ sentences into an array, I'd rather scan one sentence at a time and once I reach EOF, start back at the beginning.
How should I go about doing this? I'm not super comfortable with Encog because all the examples I've seen have either been XOR or extremely complicated.
There are 2 inputs... Each input consists of 30 neurons. The chance of the word being a certain tag is used as inputs. So, most of the neurons get 0, the others get probability inputs like .5, .3, and .2. When I say 'aren't activated' I just mean that all the neurons are set to 0. The output layer represents all the possible tags, so, its 30. Whatever one of the output neurons has the highest number is the tag that is chosen.
I'm not sure how to go through all 10,000 sentences and look-up each word in each sentence (for the inputs and activate that input) in the 'demos' of Encog that I've seen.)
It seems that the networks are trained with a single array holding all training data, and that is looped through until the network is trained. I would like to train the network with many different arrays (an array per sentence) and then look through them all again.
This format is clearly not going to work for what I'm doing:
do {
train.iteration();
System.out.println(
"Epoch #" + epoch + " Error:" + train.getError());
epoch++;
} while(train.getError() > 0.01);

So, I'm not sure how to tell you this, but that's not how a neural net works. You can't just use a word as an input, and you can't just "not activate" an input either. At a very basic level, this is what you need to run a neural network on a problem:
A fixed-length input vector (whatever you are feeding in, it must be represented numerically with a fixed length. Each entry in the vector is a single number)
A set of labels (each input vector must correspond to a single, fixed-length output vector)
Once you have those two, the neural net classifies an example, then edits itself to get as close as possible to the labels.
If you're looking to work with words and a deep learning framework, you should map your words to an existing vector representation (I would highly recommend glove, but word2vec is decent as well) and then learn on top of that representation.
After having a deeper understanding of what you're attempting here I think the issue is that you're dealing with 60 inputs, not one. These inputs are the concatenation of the existing predictions for both words (in the case with no first word the first 30 entries are 0). You should take care of the mapping yourself (should be very straightforward), and then just treat it as trying to predict 30 numbers with 60 numbers.
I feel obliged to tell you that the way you've framed the problem you will see awful performance. When dealing with a sparse (mostly zeros) vector and such a small dataset deep learning techniques will show VERY poor performance compared to other methods. You are better off using glove + svm or a random forest model on your existing data.

You can use other implementations of MLDataSet besides BasicMLDataSet.
I ran into a similar problem with windows of DNA sequences. Building an array of all the windows would not have been scalable.
Instead, I implemented my own VersatileDataSource, and wrapped it in a VersatileMLDataSet.
VersatileDataSource has just a few methods to implement:
public interface VersatileDataSource {
String[] readLine();
void rewind();
int columnIndex(String name);
}
For each readLine(), you could return the inputs for the previous/current word, and advance the position to the next word.

How do I apply a sequence of insert/delete-character operations on a string?

I have a text like this:
My name is Bob and I live in America.
I have some reference to the characters of this string, for example:
from 3 to 7 chars, deleted
at 3 char, added "surname"
from 20 to 25, deleted
at 25 char ....
but these statements aren't ordered (and I can't order them).
So, this is the question: how can I modify the text without losing the reference of the characters?
For example, if I apply the first sentence, my text became:
My is Bob and I live in America.
and my third sentence doesn't work correctly anymore, cause I've lost the reference to the original 20th character.
Keep in mind that the text is pretty long, so I can't use any indexes...

First off, if this statement is true, the situation is hopeless:
but these statements aren't ordered (and I can't order them).
An unordered list of patch statements could lead to a conflict. It will not be possible to decide what the right answer is in an automated fashion. For instance, consider the following situation:
0 1 2
index: 012345678901234567890
text: "apple banana coconuts"
- delete 5 characters from index 10
- add "xyz" at index 10
- delete 10 characters from index 5
You will wind up with different results depending on what order you execute these statements.
For instance, if you apply (1), then (2), then (3), you wind up with "apple banaconuts" --> "apple banaxyzconuts" --> "apple uts".
But if you apply (3), then (2), then (1), you wind up with "apple onuts" --> "apple onutsxyz" --> [error -- there aren't enough characters to delete!].
Either you need a repeatable, agreed-upon ordering of the statements, or you cannot proceed any further. Even worse, it turns out that discovering which orderings are valid (for example, eliminating all orderings where an impossible statement occurs, like "delete 10 characters from index 20", when there is no index 20) is an undecidable computer science problem.
If it turns out that the patches can be applied in a specific order (or at least in a repeatable, agreed-upon, deterministic order), the situation improves but is still obnoxious. Because the indices in any "patch" could be invalidated by any previous one, it's not going to be possible to straightforwardly apply each statement. Instead, you'll have to implement a small, pseudo-diff. Here's how I'd do it:
Scan through the list of patch-statements. Build a list of operations. An operation is an object with a command and some optional arguments, and an index to apply that command to. For example, your operations might look like:
DeleteOperation(index 3, length 4)
AddOperation(index 3, text "surname")
DeleteOperation(index 20, length 5)
As you perform operations, keep a reference to the original string and store a "dirty pointer". This is the latest contiguous index in the string which has had no operations performed on it. Any operation you perform whose index exceeds the dirty pointer must first be pre-processed.
If you encounter a clean operation, one whose index is less than or equal to the dirty pointer, you can apply it with no further work. The dirty pointer now moves to that operation's index.
If you encounter a dirty operation, one whose index is greater than the dirty pointer, you'll have to do some work before you can apply it. Determine the real index of where the operation should be applied by looking at the previous operations, then make the appropriate offset and apply it.
Execute each operation in turn until there are no more operations to execute.
The result is your transformed string.

You will just have to track what you are doing to the string and add or subtract from future item indexes in your list of commands.

Each time you execute a statement go over the whole list and modify indexes appropriately.

Sounds like you are doing Operational Transforms
This article over here discusses them in theory and practice (and quite some depth)
Understanding and Applying Operational Transformation
If they aren't in any order though, how can you apply them? How do you know if one operation should be applied before or after another operation? Are none of the operations additive? (ie: do all of the operations only apply to the original String?)

What exactly are you trying to do here?
If you are trying to apply these "references" to your text,
My name is Bob and I live in America.
Keep this data unaltered. Copy this to another string, and apply your "reference" there every time you need to modify it.

in java, which is better - three arrays of booleans or 1 array of bytes?

I know the question sounds silly, but consider this: I have an array of ints (1..N) and a labelling algorithm. at any point the item the int represents is in one of three states. The current version holds these states in a byte array, where 0, 1 and 2 represent the three states. alternatively, I could have three arrays of boolean - one for each state. which is better (consumes less memory) depends on how jvm (sun's version) stores the arrays - is a boolean represented by 1 bit? is there any other magic happening behind the scenes? (p.s. don't start with all that "this is not the way OO/Java works" - I know, but here performance comes in front. plus the algorithm is simple and perfectly readable even in such form).
Thanks a lot

Instead of two booleans or 1 int, just use a BitSet - http://java.sun.com/j2se/1.4.2/docs/api/java/util/BitSet.html
You can then have two bits per label/state. And BitSet being a standard java class, you are likely to get good performance.

Theoretically, with 3 boolean arrays you'll need to do:
firstState[n] = false;
secondState[n] = true;
thirdState[n] = false;
every time when you want to change n-th element state. Here you can see 3 taking element by index operations and 3 assignment operations.
With 1 byte array you'll need:
elements[n] = 1;
It's more readable and 3 times faster. And one more advantage of this solution it that you can easily add as many new states as you want (when with boolean arrays you'll need to introduce new arrays).
But I don't think you'll ever see the performance difference.
UPD: actually I'd make it more java way (not looking that you don't find easy ways) and use array of enums. This will make it much more clear and will give you some flexibility (maybe in future you'll decide that oop is not so bad thing):
enum ElementState {
FIRST, SECOND, THIRD;
}
ElementState[] elementStates = new ElementState[N];
...
elementStates[i] = ElementState.FIRST;

The JVM second edition spec (http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html) specifies that boolean arrays are encoded as (0,1), but doesn't specify the type used. So the particular JVM may or may not use bit - it could use int.
However, if performance is paramount, using a single byte would in any case seem to be your best option anyway.
EDIT: I incorrectly said that boolean arrays are stored as bit arrays - this is possible but implementation specific.

If you want a guaranteed minimum you could use three java.util.BitSets. These will only use one bit per flag (though you will have the extra object overhead, that may outweigh the benefits if the number of flags is small.) I would say if you have a large number of objects BitSet may be a better alternative, otherwise an array of byte constants or enums will lead to more readable code (and the extra storage shouldn't be a real concern.)

The array of bytes is much better!
A boolean uses in every programming language 1 byte! So you will use for every state 3 bytes and you can do this with only 1 byte (in theory you can reduce it to only 1 bit (see other posts).
with a byte array, you can simply change it to the byte you want. With three arrays you have to change the value at every array!
When you are your application developing, it is possible you need an extra state. So, this means you have to create again an array. Plus you have to change 4 values (second point)
So, I hope we persuaded you!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.