Backtracking search for constraint satisfaction problem implementation

Backtracking search for constraint satisfaction problem implementation - java

For a homework assignment, my goal is to have a backtracking search using minimum remaining values, a degree heuristic, and forward checking. Needs to solve a Boolean satisfiability problem consisting of sets of 3 Boolean variables or'd with each other, and each set must evaluate to true. As my current implementation stands, I believe it will eventually solve it, but it takes so long to finish that I end up with a java heap out of memory error.
With that out of the way, my implementation is as follows:
Have an arrayList of each constraint
an array of values, with the index being the value for each variable that's either true, false, or not yet given a value, initially all not given a value. The 0th element is used as a flag.
an array for the domain of each variable: either true or false, only true, only false or no possible value, initially true or false
An arrayList of arrays, each array being a list of values; used to avoid trying the same thing twice
An array for the number of times a variable is in a constraint: this is the degree heuristic
The back search returns an array of values, and takes in an array of values and an array of domains
First checks each constraint in the list to make sure the domain for each variable works to make it true. If none of them work, set 0th flag and exit. Additionally, if 2 variables have a domain that makes the the third variable have to be true in order to make the statement work, changes the domain of that variable.
After it passes that step, it checks if each variable in the vals array has an assigned value, ending the program on a success.
Next it makes a temporary array to keep track of values it's tried, and begins a loop. Inside, it adds each variable that has a domain of the smallest domain found (MRV) and doesn't have a value/hasn't been tried, adding it to the list, and exiting the recursion if it can't find any. It then selects from the variables the one that appears the most based on the degree heuristic array. On a tie, it picks the last one that appears in the tie, and sets a flag so we don't try that variable again in the same recursion.
Still inside the loop, it first tries to set the domain and value of that variable as true if the domain isn't only false, and false otherwise. It checks to see if that value combination for the variables has been done before, and if it has it reselects. If it hasn't, it adds that value combination to the list and does a recursive step. If that recursive step returns a stop flag, swap back the values to what they were for the domain and value of selected variable, and tries again, this time making the domain and value false, but first it adds it to the list of tried combinations, reselecting if its already there, and resets the domains of all variables that don't have a value yet, then does the recursive step. If that also fails, resets the values of the variable selected domain and value and tries to select a different variable in the same loop. The loop breaks once it's complete or has failed for all combinations, and then returns the values array.
I can tell that it's not repeating itself but I don't know how I can fix/speed up my implementation so that it runs at a reasonable time.

The first step is to create a knowledge model for the domain. This can be done with a domain specific language (DSL). In the DSL syntax a possible solution to the problem is formulated. The wanted side effect of a domain specific language is, that a grammar has to be created which can parse the language. This grammar can be used in the reverse order which is called a "generative grammar". The aim is to include as much domain knowledge as possible in the DSL which makes it easier for the solver to test out all states.
Unfortunately, the question has only a few information about the domain itself. According to the description there are three variables which can be on or off. This would generate a possible statespace of 2x2x2=8 which seems a bit too easy for a domain, because the solver is done if he tested out all 8 states. So i would guess, that the problem is bit harder but not explained in the description. Nevertheless, the first step is to convert the problem into a language which can be parsed by a grammar.

Related

Comparing sets of randomly assigned codes in Java to assign a name

Good day,
I honestly do not know how to phrase the problem in the title, thus the generic description. Actually I have a set of ~150 codes, which are combined to produce a single string, like this "a_b_c_d". Valid combinations contain 1-4 code combinations plus the '-' character if no value is assigned, and each code is only used once( "a_a..." is not considered valid). These sets of codes are then assigned to a unique name. Not all combinations are logical, but if a combination is valid then the order of the codes does not matter (if "f_g_e_-" is valid, then "e_g_f_-","e_f_-_ g_" is valid, and they all have the same name). I have taken the time and assigned each valid combination to its unique name and tried to create a single parser to read these values and produce the name.
I think the problem is apparent. Since the order does not matter, I have to check for every possible combination. The codes cannot be strictly orderd, since there are some codes who have meaning in any position.So, this is impossible to accomplish with a simple parser. Is there an optimal way to do this, or will I have to force the user to use some kind of order against the standard?

Try using TreehMap to store the code (string) and and its count (int). increment the count for the code every time it is encountered in the string.
After processing the whole string if you find the count for any code > 1 then string has repeated codes and is invalid, else valid.
Traversing TreeMap will be sorted based on key value. Traverse the TreeMap to generate code sequence that will be sorted.

Fuzzy Matching Duplicates in Java

I have a List<String[]> of customer records in Java (from a database). I know from manually eyeballing the data that 25%+ are duplicates.
The duplicates are far from exact though. Sometimes they have different zips, but the same name and address. Other times the address is missing completely, etc...
After a day of research; I'm still really stumped as to how to even begin to attack this problem?
What are the "terms" that I should be googling for that describe this area (from a solve this in Java perspective)? And I don't suppose there is fuzzymatch.jar out there that makes it all just to easy?

I've done similar systems before for matching place information and people information. These are complex objects with many features and figuring out whether two different objects describe the same place or person is tricky. The way to do it is to break it down to the essentials.
Here's a few things that you can do:
0) If this is a oneoff, load the data into openrefine and fix things interactively. Maximum this solves your problem, minimum it will show you where your possible matches are.
1) there are several ways you can compare strings. Basically they differ in how reliable they are in producing negative and false matches. A negative match is when it matches when it shouldn't have. A positive match is when it should match and does. String equals will not produce negative matches but will miss a lot of potential matches due to slight variations. Levenstein with a small factor is a slightly better. Ngrams produce a lot of matches, but many of them will be false. There are a few more algorithms, take a look at e.g. the openrefine code to find various ways of comparing and clustering strings. Lucene implements a lot of this stuff in its analyzer framework but is a bit of a beast to work with if you are not very familiar with its design.
2) Separate the process of comparing stuff from the process of deciding whether you have a match. What I did in the past was qualify my comparisons, using a simple numeric score e.g. this field matched exactly (100) but that field was a partial match (75) and that field did not match at all. The resulting vector of qualified comparisons, e.g. (100, 75,0,25) can be compared to a reference vector that defines your perfect or partial match criteria. For example if first name, last name, and street match, the two records are the same regardless of the rest of the fields. Or if phonenumbers and last names match, that's a valid match too. You can encode such perfect matches as a vector and then simply compare it with your comparison vectors to determine whether it was a match, not a match, or a partial match. This is sort of a manual version of what machine learning does which is to extract vectors of features and then build up a probability model of which vectors mean what from reference data. Doing it manually, can work for simple problems.
3) Build up a reference data set with test cases that you know to match or not match and evaluate your algorithm against that reference set. That way you will know when you are improving things or making things worse when you tweak e.g. the factor that goes into Levinstein or whatever.

Jilles' answer is great and comes from experience. I've also had to work on cleaning up large messy tables and sadly didn't know much about my options at that time (I ended up using Excel and a lot of autofilters). Wish I'd known about OpenRefine.
But if you get to the point where you have to write custom code to do this, I want to make a suggestion as to how: The columns are always the same, right? For instance, the first String is always the key, the second is the First name, the sixth is the ZIP code, tenth is the fax number, etc.?
Assuming there's not an unreasonable number of fields, I would start with a custom Record type which has each DB field as member rather than a position in an array. Something like
class CustomerRow {
public final String id;
public final String firstName;
// ...
public CustomerRow(String[] data) {
id = data[0];
// ...
}
You could also include some validation code in the constructor, if you knew there to be garbage values you always want to filter out.
(Note that you're basically doing what an ORM would do automatically, but getting started with one would probably be more work than just writing the Record type.)
Then you'd implement some Comparator<CustomerRow>s which only look at particular fields, or define equality in fuzzy terms (there's where the edit distance algorithms would come in handy), or do special sorts.
Java uses a stable sort for objects, so to sort by e.g. name, then address, then key, you would just do each sort, but choose your comparators in the reverse order.
Also if you have access to the actual database, and it's a real relational database, I'd recommend doing some of your searches as queries where possible. And if you need to go back and forth between your Java objects and the DB, then using an ORM may end up being a good option.

What is the Clear Purpose of SparseBooleanArray?? [ I referred official android site for this ]

I referred the android doc site for "SparseBooleanArray" class but still not getting idea of that class about what is the purpose of that class?? For what purpose we need to use that class??
Here is the Doc Link
http://developer.android.com/reference/android/util/SparseBooleanArray.html

From what I get from the documentation it is for mapping Integer values to booleans.
That is, if you want to map, if for a certain userID a widget should be shown and some userIDs have already been deleted, you would have gaps in your mapping.
Meaning, with a normal array, you would create an array of size=maxID and add a boolean value to element at index=userID. Then when iterating over the array, you would have to iterate over maxID elements in the worst case and have to check for null if there is no boolean for that index (eg. the ID does not exist). That is really inefficient.
When using a hashmap to do that you could map the ID to the boolean, but with the added overhead of generating the hashvalue for the key (that is why it is called *hash*map), which would ultimately hurt performance firstly in CPU cycles as well as RAM usage.
So that SparseBooleanArray seems like a good middleway of dealing with such a situation.
NOTE: Even though my example is really contrieved, I hope it illustrates the situation.

Like the javadoc says, SparseBooleanArrays map integers to booleans which basically means that it's like a map with Integer as a key and a boolean as value (Map).
However it's more efficient to use in this particular case It is intended to be more efficient than using a HashMap to map Integers to Booleans
Hope this clears out any issues you had with the description.

I found a very specific and wonderful use for the sparse boolean array.
You can put a true or false value to be associated with a position in a list.
For example: List item #7 was clicked, so putting 7 as the key and true as the value.

There can be three ways to store resource id's
1 Array
Boolean array containing id's as indexes.If we have used that id set it to true else false
Though all the operations are fast but this implementation will require huge amount of space.So it can't be used
High Space Complexity
2 HashMap
Key-ID
Value-Boolean True/False
Using this we need to process each id using the hashing function which will consume memory.Also there may be some empty locations where no id will be stored and we also need to deal with crashes.So due to usage complexity and medium space complexity, it is not used.
Medium Space Complexity
3 SparseBooleanArray
It is middle way.It uses mapping and Array Implementation
Key - ID
Value - Boolean True/False
It is an ArrayList which stores id's in an increasing order.So minimum space is used as it only contains id's which are being used.For searching an id binary search is used.
Though Binary Search O(logn) is slower than hashing O(1) or Array O(1),i.e. all the operations Insertion, Deletion, Searching will take more time but there is least memory wastage.So to save memory we prefer SparseBoolean Array
Least Space Complexity

storing sets of integers to check if a certain set has already been mentioned

I've come across an interesting problem which I would love to get some input on.
I have a program that generates a set of numbers (based on some predefined conditions). Each set contains up to 6 numbers that do not have to be unique with integers that ranges from 1 to 100).
I would like to somehow store every set that is created so that I can quickly check if a certain set with the exact same numbers (order doesn't matter) has previously been generated.
Speed is a priority in this case as there might be up to 100k sets stored before the program stops (maybe more, but most the time probably less)! Would anyone have any recommendations as to what data structures I should use and how I should approach this problem?
What I have currently is this:
Sort each set before storing it into a HashSet of Strings. The string is simply each number in the sorted set with some separator.
For example, the set {4, 23, 67, 67, 71} would get encoded as the string "4-23-67-67-71" and stored into the HashSet. Then for every new set generated, sort it, encode it and check if it exists in the HashSet.
Thanks!

if you break it into pieces it seems to me that
creating a set (generate 6 numbers, sort, stringify) runs in O(1)
checking if this string exists in the hashset is O(1)
inserting into the hashset is O(1)
you do this n times, which gives you O(n).
this is already optimal as you have to touch every element once anyways :)
you might run into problems depending on the range of your random numbers.
e.g. assume you generate only numbers between one and one, then there's obviously only one possible outcome ("1-1-1-1-1-1") and you'll have only collisions from there on. however, as long as the number of possible sequences is much larger than the number of elements you generate i don't see a problem.
one tip: if you know the number of generated elements beforehand it would be wise to initialize the hashset with the correct number of elements (i.e. new HashSet<String>( 100000 ) );
p.s. now with other answers popping up i'd like to note that while there may be room for improvement on a microscopic level (i.e. using language specific tricks), your overal approach can't be improved.

Create a class SetOfIntegers
Implement a hashCode() method that will generate reasonably unique hash values
Use HashMap to store your elements like put(hashValue,instance)
Use containsKey(hashValue) to check if the same hashValue already present
This way you will avoid sorting and conversion/formatting of your sets.

Just use a java.util.BitSet for each set, adding integers to the set with the set(int bitIndex) method, you don't have to sort anything, and check a HashMap for already existing BitSet before adding a new BitSet to it, it will be really very fast. Don't use sorting of value and toString for that purpose ever if speed is important.

How do I apply a sequence of insert/delete-character operations on a string?

I have a text like this:
My name is Bob and I live in America.
I have some reference to the characters of this string, for example:
from 3 to 7 chars, deleted
at 3 char, added "surname"
from 20 to 25, deleted
at 25 char ....
but these statements aren't ordered (and I can't order them).
So, this is the question: how can I modify the text without losing the reference of the characters?
For example, if I apply the first sentence, my text became:
My is Bob and I live in America.
and my third sentence doesn't work correctly anymore, cause I've lost the reference to the original 20th character.
Keep in mind that the text is pretty long, so I can't use any indexes...

First off, if this statement is true, the situation is hopeless:
but these statements aren't ordered (and I can't order them).
An unordered list of patch statements could lead to a conflict. It will not be possible to decide what the right answer is in an automated fashion. For instance, consider the following situation:
0 1 2
index: 012345678901234567890
text: "apple banana coconuts"
- delete 5 characters from index 10
- add "xyz" at index 10
- delete 10 characters from index 5
You will wind up with different results depending on what order you execute these statements.
For instance, if you apply (1), then (2), then (3), you wind up with "apple banaconuts" --> "apple banaxyzconuts" --> "apple uts".
But if you apply (3), then (2), then (1), you wind up with "apple onuts" --> "apple onutsxyz" --> [error -- there aren't enough characters to delete!].
Either you need a repeatable, agreed-upon ordering of the statements, or you cannot proceed any further. Even worse, it turns out that discovering which orderings are valid (for example, eliminating all orderings where an impossible statement occurs, like "delete 10 characters from index 20", when there is no index 20) is an undecidable computer science problem.
If it turns out that the patches can be applied in a specific order (or at least in a repeatable, agreed-upon, deterministic order), the situation improves but is still obnoxious. Because the indices in any "patch" could be invalidated by any previous one, it's not going to be possible to straightforwardly apply each statement. Instead, you'll have to implement a small, pseudo-diff. Here's how I'd do it:
Scan through the list of patch-statements. Build a list of operations. An operation is an object with a command and some optional arguments, and an index to apply that command to. For example, your operations might look like:
DeleteOperation(index 3, length 4)
AddOperation(index 3, text "surname")
DeleteOperation(index 20, length 5)
As you perform operations, keep a reference to the original string and store a "dirty pointer". This is the latest contiguous index in the string which has had no operations performed on it. Any operation you perform whose index exceeds the dirty pointer must first be pre-processed.
If you encounter a clean operation, one whose index is less than or equal to the dirty pointer, you can apply it with no further work. The dirty pointer now moves to that operation's index.
If you encounter a dirty operation, one whose index is greater than the dirty pointer, you'll have to do some work before you can apply it. Determine the real index of where the operation should be applied by looking at the previous operations, then make the appropriate offset and apply it.
Execute each operation in turn until there are no more operations to execute.
The result is your transformed string.

You will just have to track what you are doing to the string and add or subtract from future item indexes in your list of commands.

Each time you execute a statement go over the whole list and modify indexes appropriately.

Sounds like you are doing Operational Transforms
This article over here discusses them in theory and practice (and quite some depth)
Understanding and Applying Operational Transformation
If they aren't in any order though, how can you apply them? How do you know if one operation should be applied before or after another operation? Are none of the operations additive? (ie: do all of the operations only apply to the original String?)

What exactly are you trying to do here?
If you are trying to apply these "references" to your text,
My name is Bob and I live in America.
Keep this data unaltered. Copy this to another string, and apply your "reference" there every time you need to modify it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.