Store Math formula - java

I'm working on an application that gives the user a hand to create mathematical formulas with variables and that could use them afterwards.
Example of a mathematical formula that the user can create:
A*B+ (C/D*E)
After creating the formula, the user can use it in calculations, so I should parse it and retrieve the different variables (in our example:"A, B, C, D and E") and create a form for entering the values of these variables.
My question is: How do I store my formula?
I thought of a String that I would parse manually to retrieve the variables, and use Maths.eval to calculate the formula.
Or else, I could store my formula in JSON to facilitate the recuperation of variables after.
What do you think is the right way to do it?
Thank you
NB : I know my question may be misspelled, I apologize.

The usual way to represent, parse and evaluate such formulas is to use Binary expression tree. This is especially useful when having to deal with operator priorities (* before +), brackets, etc. Moreover, tree traversal is a natural approach to solving such problems and different traversal types can be used (inorder, preorder, ...).
EDIT: The evaluation can be implemented via stack.

Related

Calculating the principal axis/eigenvalues and -vectors of large dataset in Java

I have a large dataset (>500.000 elements) that contains the stress values (σ_xx, σ_yy, σ_zz, τ_xy, τ_yz, τ_xz) of FEM-Elements. These stress values are given in the global xyz-coordinate space of the model. I want to calculate the main axis stress values and directions from those. If you're not that familiar with the physics behind it, this means taking the symmetric matrix
| σ_xx τ_xy τ_xz |
| τ_xy σ_yy τ_yz |
| τ_xz τ_yz σ_zz |
and calculating its eigenvalues and eigenvectors. Calculating each set of eigenvalues and -vectors on its own is too slow. I'm looking for a library, an algorithm or something in Java that would allow me to do this as array calculations. As an example, in python/numpy I could just take all my 3x3-matrices, stack them along a third dimension to get a nx3x3-array, and pass that to np.linalg.eig(arr), and it automatically gives me an nx3-array for the three eigenvalues and an nx3x3-array for the three eigenvectors.
Things I tried:
nd4j has an Eigen-module for calculating eigenvalues and -vectors, but only supports a single square array at a time.
Calculate the characteristic polynomial and use cardanos formula to get the roots/eigenvalues - possible to do for the whole array at once, but I'm stuck now on how to get the corresponding eigenvectors. Is there maybe a general simple algorithm to get from those to the eigenvectors?
Looking for an analytical form of the eigenvalues and -vectors that can be calculated directly: It does exist, but just no.
You'll need to write a little code.
I'd create or use a Matrix class as a dependency and find methods to give you eigenvalues and eigenvectors. The ones you found in nd4j sound like great candidates. You might also consider the Linear Algebra For Java (LA4J) dependency.
Load the dataset into a List<Matrix>.
Use functional Java methods to apply a map to give you a List of eigenvalues as a vector per stress matrix and a List of eigenvectors as a matrix per stress matrix.
You can optimize this calculation to the greatest extent possible by applying the map function to a stream. Java will parallelize the calculation under the covers to leverage available cores to the greatest extent possible.
Follow-up: This is the way that worked best for me, as I can do all operations without iterating over every element. As stated above, I'm using Nd4j, which seems to be limited in its possibilities compared to numpy (or maybe I just didn't read the documentation thoroughly enough). The following method uses only basic array operations:
From the given stress values, calculate the eigenvalues using Cardano's formula. Only element wise instructions are needed to do that (add, sub, mul, div, pow). The result should be three vectors of size n, each containing one eigenvalue for all elements.
Use the formula given here to calculate the matrix S for each eigenvalue. Like step 1, this can obviously also be done using only element-wise operations with the stress value- and eigenvalue-vectors, in order to avoid specifiying some complicated instructions on which array to multiply according to which axis while keeping whatever other axis.
Take one column from S and normalize it to get a normalized eigenvector for the given eigenvalue.
Note that this method only works if you have a real symmetric matrix. You also should make sure to properly deal with cases where the same eigenvalue appears multiple times.

Get all extreme points of Linear Program in CPLEX

I need to enumerate all basis corresponding to all extreme points of a LP with the CPLEX API in Java. Unfortunately I did not find any way to do this with CPLEX. Is there a solution ?
If not, I will do this myself but I will need to play with basis. Is any simple way with CPLEX to enumerate all basis and check if a basis is a feasible solution ?
The short answer: no.
There is no easy way to do this. One possible approach, but somewhat cumbersome, is to encode the basis using binary variables. E.g.:
xb[i] = 1 for basic variables
0 for non-basic variables
We need to add constraints on non-basic variables: they will be at bound. I.e. for a non-negative variable x[i] we have
xb[i]=0 => x[i]=0
(this is an indicator constraint). Furthermore we know that
sum(i,xb[i]) = m
(the number of basic variables is equal to the number of rows in the model).
Then use Cplex's solution pool to enumerate all possible feasible bases. An illustration for this approach is shown in this link. (This particular example enumerates all optimal bases, but it is not difficult to tell Cplex to enumerate all feasible bases).

Creating a derivation tree from a set of Grammar rules

I have 2 arrays, defining a set of rules in a Context free grammar. With array 1 being the left Side of a rule and array 2 being the right side of a rule, for example :
A = B | C would translate to array1[0] = A, array2[0] = B C
From this, I want to construct all the possible derivation given an integer that defines how many steps can occur. So for example, A ---> C would constitute 1 step. If the integer would be 3, a program would print out all the possible derivations that occur in 3 steps.
Any advice on how to tackle this program would be appreciated, I've been trying to think a way around the problem for hours with no success. I'm using Java.
Thanks.
Since you are using String objects to express derivations you could use your starting symbol and split its derivation using the delimiter " ". Then you could search for derivations for the non-terminals in the right order and try to recursively derive them.
If there are only terminals left, you could print out the complete derivation which you have stored in a structure, e. g. a list, which contains every derivation.
Then you get back to the last point where still other options are available to derive.
But I think your modelling is not that good since this is not really efficient. The problem you are dealing with is typically solved by parser implementations which are also used e. g. for compiler implementation. Have a look at this resource (wikipedia): Parsing.
There are two main approaches: Top-down- and bottom-up-parsing.

Fuzzy Matching Duplicates in Java

I have a List<String[]> of customer records in Java (from a database). I know from manually eyeballing the data that 25%+ are duplicates.
The duplicates are far from exact though. Sometimes they have different zips, but the same name and address. Other times the address is missing completely, etc...
After a day of research; I'm still really stumped as to how to even begin to attack this problem?
What are the "terms" that I should be googling for that describe this area (from a solve this in Java perspective)? And I don't suppose there is fuzzymatch.jar out there that makes it all just to easy?
I've done similar systems before for matching place information and people information. These are complex objects with many features and figuring out whether two different objects describe the same place or person is tricky. The way to do it is to break it down to the essentials.
Here's a few things that you can do:
0) If this is a oneoff, load the data into openrefine and fix things interactively. Maximum this solves your problem, minimum it will show you where your possible matches are.
1) there are several ways you can compare strings. Basically they differ in how reliable they are in producing negative and false matches. A negative match is when it matches when it shouldn't have. A positive match is when it should match and does. String equals will not produce negative matches but will miss a lot of potential matches due to slight variations. Levenstein with a small factor is a slightly better. Ngrams produce a lot of matches, but many of them will be false. There are a few more algorithms, take a look at e.g. the openrefine code to find various ways of comparing and clustering strings. Lucene implements a lot of this stuff in its analyzer framework but is a bit of a beast to work with if you are not very familiar with its design.
2) Separate the process of comparing stuff from the process of deciding whether you have a match. What I did in the past was qualify my comparisons, using a simple numeric score e.g. this field matched exactly (100) but that field was a partial match (75) and that field did not match at all. The resulting vector of qualified comparisons, e.g. (100, 75,0,25) can be compared to a reference vector that defines your perfect or partial match criteria. For example if first name, last name, and street match, the two records are the same regardless of the rest of the fields. Or if phonenumbers and last names match, that's a valid match too. You can encode such perfect matches as a vector and then simply compare it with your comparison vectors to determine whether it was a match, not a match, or a partial match. This is sort of a manual version of what machine learning does which is to extract vectors of features and then build up a probability model of which vectors mean what from reference data. Doing it manually, can work for simple problems.
3) Build up a reference data set with test cases that you know to match or not match and evaluate your algorithm against that reference set. That way you will know when you are improving things or making things worse when you tweak e.g. the factor that goes into Levinstein or whatever.
Jilles' answer is great and comes from experience. I've also had to work on cleaning up large messy tables and sadly didn't know much about my options at that time (I ended up using Excel and a lot of autofilters). Wish I'd known about OpenRefine.
But if you get to the point where you have to write custom code to do this, I want to make a suggestion as to how: The columns are always the same, right? For instance, the first String is always the key, the second is the First name, the sixth is the ZIP code, tenth is the fax number, etc.?
Assuming there's not an unreasonable number of fields, I would start with a custom Record type which has each DB field as member rather than a position in an array. Something like
class CustomerRow {
public final String id;
public final String firstName;
// ...
public CustomerRow(String[] data) {
id = data[0];
// ...
}
You could also include some validation code in the constructor, if you knew there to be garbage values you always want to filter out.
(Note that you're basically doing what an ORM would do automatically, but getting started with one would probably be more work than just writing the Record type.)
Then you'd implement some Comparator<CustomerRow>s which only look at particular fields, or define equality in fuzzy terms (there's where the edit distance algorithms would come in handy), or do special sorts.
Java uses a stable sort for objects, so to sort by e.g. name, then address, then key, you would just do each sort, but choose your comparators in the reverse order.
Also if you have access to the actual database, and it's a real relational database, I'd recommend doing some of your searches as queries where possible. And if you need to go back and forth between your Java objects and the DB, then using an ORM may end up being a good option.

Algorithm to reduce satisfiability java

Is there any algorithm to reduce sat problem.
Satisfiability is the problem of determining if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to TRUE. Equally important is to determine whether no such assignments exist, which would imply that the function expressed by the formula is identically FALSE for all possible variable assignments. In this latter case, we would say that the function is unsatisfiable; otherwise it is satisfiable. To emphasize the binary nature of this problem, it is frequently referred to as Boolean or propositional satisfiability. The shorthand "SAT" is also commonly used to denote it, with the implicit understanding that the function and its variables are all binary-valued.
I have used genetic algorithms to solve this, but it would be easier if is reduced first?.
Take a look at Reduced Order Binary Decision Diagrams (ROBDD). It provides a way of compressing boolean expressions to a reduced canonical form. There's plenty of software around for performing the BDD reduction, the wikipedia link above for ROBDD contains a nice list of external links to other relevant packages at the bottom of the article.
You could probably do a depth-first path-tree search on the formula to identify "paths" - Ie, for (ICanEat && (IHaveSandwich || IHaveBanana)), if "ICanEat" is false, the values in brackets don't matter and can be ignored. So, right there you can discard some edges and nodes.
And, if while you're generating this depth-first search, the current Node resolves to True, you've found your solution.
What do you mean by "reduced", exactly? I'm going to assume you mean some sort of preprocessing beforehand, to maybe eliminate or simplify some variables or clauses first.
It all depends on how much work you want to do. Certainly you should do unit propagation until it completes. There are other, more expensive things you can do. See the pre-processing section of the march_dl page for some examples.

Categories

Resources