interpreting conditions written in natural language into Java code - java

The problem:
I want users to be able to write conditions in a simple syntax in a text editor, as in:
A?outcome1:(B?outcome2:outcome3)
A and B are boolean conditions. So the sentence above means: if A is true, then outcome1, else if B is true, then outcome 2, else outcome3.
In Java, I implement an interpreter of this syntax so that A, B, outcome1, outcome2,outcome3 get translated in values that are pre-stored somewhere (A and B will be functions returning a boolean, outcomes will be objects), and the condition is evaluated and a result is returned.
My question is, am I reinventing the wheel here? Are there Java packages or libraries that already provide neat implementations of "[constrained] natural language interpreted to Java code" kind of functions?
Thx!

I ended up writing a class in Java that reads a human-written rule and interprets it. Find it here:
https://github.com/seinecle/Umigon/blob/master/src/java/RuleInterpreter/Interpreter.java

Related

Is there any way to write parsing logic using json?

I have a map in java Map<String,Object> dataMap whose content looks like this -
{country=Australia, animal=Elephant, age=18}
Now while parsing the map the use of various conditional statements may be made like-
if(dataMap.get("country").contains("stra")
OR
if(dataMap.get("animal") || 100 ==0)
OR
Some other operation inside if
I want to create a config file that contains all the rules on how the data inside the Map should look like. In simple words, I want to define the conditions that value corresponding to keys country, animal, and age should follow, what operations should be performed on them, all in the config file, so that the if elses and extra code can be removed. The config file will be used for parsing the map.
Can someone tell me how such a config file can be written, and how can it be used inside Java?
Sample examples and code references will be of help.
I am thinking of creating a json file for this purpose
Example -
Boolean b = true;
List<String> conditions = new ArrayList<>();
if(dataMap.get("animal").toString().contains("pha")){
conditions.add("condition1 satisifed");
if(((Integer.parseInt(dataMap.get("age").toString()) || 100) ==0)){
conditions.add("condition2 satisifed");
if(dataMap.get("country").equals("Australia")){
conditions.add("condition3 satisifed");
}
else{
b=false;
}
}
else{
b=false;
}
}
else{
b=false;
}
Now suppose I want to define the conditions in a config file for each map value like the operation ( equals, OR, contains) and the test values, instead of using if else's. Then the config file can be used for parsing the java map
Just to manage expectations: Doing this in JSON is a horrible, horrible idea.
To give you some idea of what you're trying to make:
Grammars like this are best visualized as a tree structure. The 'nodes' in this tree are:
'atomics' (100 is an atom, so is "animal", so is dataMap).
'operations' (+ is an operation, so is or / ||).
potentially, 'actions', though you can encode those as operations.
Java works like this, so do almost all programming languages, and so does a relatively simple 'mathematical expression engine', such as something that can evaluate e.g. the string "(1 + 2) * 3 + 5 * 10" into 59.
In java, dataMap.get("animal") || 100 ==0 is parsed into this tree:
OR operation
/ \
INVOKE get[1] equality
/ \ / \
dataMap "animal" INT(100) INT(0)
where [1] is stored as INVOKEVIRTUAL java.util.Map :: get(Object) with as 'receiver' an IDENT node, which is an atomic, with value dataMap, and an args list node which contains 1 element, the string literal atomic "animal", to be very precise.
Once you see this tree you see how the notion of precedence works - your engine will need to be capable of representing both (1 + 2) * 3 as well as 1 + (2 * 3), so doing this without trees is not really possible unless you delve into bizarre syntaxis, where the lexical ordering matching processing ordering (if you want that, look at how reverse polish notation calculators work, or something like fortran - stack based language design. I don't think you'll like what you find there).
You're already making language design decisions here. Apparently, you think the language should adopt a 'truthy'/'falsy' concept, where dataMap.get("animal") which presumably returns an animal object, is to be considered as 'true' (as you're using it in a boolean operation) if, presumably, it isn't null or whatnot.
So, you're designing an entire programming language here. Why handicap yourself by enforcing that it is written in, of all things, JSON, which is epically unsuitable for the job? Go whole hog and write an entire language. It'll take 2 to 3 years, of course. Doing it in json isn't going to knock off more than a week off of that total, and make something that is so incredibly annoying to write, nobody would ever do it, buying you nothing.
The language will also naturally trend towards turing completeness. Once a language is turing complete, it becomes mathematically impossible to answer such questions as: "Is this code ever going to actually finish running or will it loop forever?" (see 'halting problem'), you have no idea how much memory or CPU power it takes, and other issues that then result in security needs. These are solvable problems (sandboxing, for example), but it's all very complicated.
The JVM is, what, 2000 personyears worth of experience and effort?
If you got 2000 years to write all this, by all means. The point is: There is no 'simple' way here. It's a woefully incomplete thing that never feels like you can actually do what you'd want to do (which is express arbitrary ideas in a manner that feels natural enough, can be parsed by your system, and when you read back still makes sense), or it's as complex as any language would be.
Why not just ... use a language? Let folks write not JSON but write full blown java, or js, or python, or ruby, or lua, or anything else that already exists, is open source, seems well designed?

Converting Natural language logical condition into a Java code

How do I go about translating a Natural language logical condition into its Java code counterpart?
Say I have this condition
(Color Equals Blue) AND (Name Contains Smith)
What can be done to translate this to a Java level code, which might look like
(Color.equals("Blue") && (Name.contains("Smith")))
I could not come up with any definitive approach to achieve the desired outcome, so here I am asking this question.
Also, please let me know that reason before down-voting.
I would try a dsl that parses the syntax you specified (Color Equals Blue) AND (Name Contains Smith) into java code. Make sure you understand the grammar for this syntax very well though before you start translating it into your parser grammar. A language specification link would help a lot. In the past I used ANTLR and it was really easy to generate JAVA code once i had the syntax tree from the output string.

BigDecimal notation eclipse plugin or nice external tool

I need to make a lot of operations using BigDecimal, and I found having to express
Double a = b - c * d; //natural way
as
BigDecimal a = b.subtract(c.multiply(d))//BigDecimal way
is not only ugly, but a source of mistakes and communication problems between me and business analysts. They were perfectly able to read code with Doubles, but now they can't.
Of course a perfect solution will be java support for operator overloading, but since this not going to happen, I'm looking for an eclipse plugin or even an external tool that make an automatic conversion from "natural way" to "bigdecimal way".
I'm not trying to preprocess source code or dynamic translation or any complex thing, I just want something I can input text and get text, and keep the "natural way" as a comment in source code.
P.S.: I've found this incredible smart hack but I don't want to start doing bytecode manipulation. Maybe I can use that to create a Natural2BigDecimal translator, but I don't want to reinvent the wheel if someone has already done such a tool.
I don't want to switch to Scala/Groovy/JavaScript and I also can't, company rules forbid anything but java in server side code.
"I'm not trying to preprocess source code ... I just want something I can input [bigDecimal arithmetic expression] text".
Half of solving a problem is recognizing the problem for what it is. You exactly want something to preprocess your BigDecimal expressions to produce legal Java.
You have only two basic choices:
A stand-alone "domain specific language" and DSL compiler that accepts "standard" expressions and converts them directly to Java code. (This is one kind of preprocessor). This leaves you with the problem of keeping all the expression fragments around, and somehow knowing where to put them in the Java code.
A tool that reads the Java source text, finds such expressions, and converts them to BigDecimal in the text. I'd suggest something that let you code the expressions outside the actual code and inserted the translation.
Perhaps (stolen from another answer):
// BigDecimal a = b - c * d;
BigDecimal a = b.subtract( c.multiply( d ) );
with the meaning "compile the big decimal expression in the comment into its java equivalent, and replace the following statement with that translation.
To implement the second idea, you need a program transformation system, which can apply source-to-source rewriting rules to transforms (generate as a special case of transform) the code. This is just a preprocessor that is organized to be customizable to your needs.
Our DMS Software Reengineering Toolkit with its Java Front End could do this. You need a full Java parser to do that transformation part; you'll want name and type resolution so that you can parse/check the proposed expression for sanity.
While I agree that the as-is Java notation is ugly, and your proposal would make it prettier, my personal opinion is this isn't worth the effort. You end up with a dependency on a complex tool (yes, DMS is complex: manipulating code isn't easy) for a rather marginal gain.
If you and your team wrote thousands of these formulas, or the writers of such formulas were Java-naive it might make sense. In that case,
I'd go further, and simply insist you write the standard expression format where you need it. You could customize the Java Front End to detect when the operand types were of decimal type, and do the rewriting for you. Then you simply run this preprocessor before every Java compilation step.
I agree, it's very cumbersome! I use proper documentation (comments before each equation) as the best "solution" to this.
// a = b - c * d;
BigDecimal a = b.subtract( c.multiply( d ) )
You might go the route of an expression evaluator. There is a decent (albeit paid) one at http://www.singularsys.com/jep. Antlr has a rudimentary grammar that also does expression evaluation (tho I am not sure how it would perform) at http://www.antlr.org/wiki/display/ANTLR3/Expression+evaluator.
Neither would give you the compile-time safety you would have with true operators. You could also write the various algorithm-based classes in something like Scala, which does support operator overloading out of the box and would interoperate seamlessly with your other Java classes.

Detecting equivalent expressions

I'm currently working on a Java application where I need to implement a system for building BPF expressions. I also need to implement mechanism for detecting equivalent BPF expressions.
Building the expression is not too hard. I can build a syntax tree using the Interpreter design pattern and implement the toString for getting the BPF syntax.
However, detecting if two expressions are equivalent is much harder. A simple example would be the following:
A: src port 1024 and dst port 1024
B: dst port 1024 and src port 1024
In order to detect that A and B are equivalent I probably need to transform each expression into a "normalized" form before comparing them. This would be easy for above example, however, when working with a combination of nested AND, OR and NOT operations it's getting harder.
Does anyone know how I should best approach this problem?
One way to compare boolean expressions may be to convert both to the disjunctive normal form (DNF), and compare the DNF. Here, the variables would be Berkeley Packet Filter tokens, and the same token (e.g. port 80) appearing anywhere in either of the two expressions would need to be assigned the same variable name.
There is an interesting-looking applet at http://www.izyt.com/BooleanLogic/applet.php - sadly I can't give it a try right now due to Java problems in my browser.
I'm pretty sure detecting equivalent expressions is either an np-hard or np-complete problem, even for boolean-only expressions. Meaning that to do it perfectly, the optimal way is basically to build complete tables of all possible combinations of inputs and the results, then compare the tables.
Maybe BPF expressions are limited in some way that changes that? I don't know, so I'm assuming not.
If your problems are small, that may not be a problem. I do exactly that as part of a decision-tree designing algorithm.
Alternatively, don't try to be perfect. Allow some false negatives (cases which are equivalent, but which you won't detect).
A simple approach may be to do a variant of the normal expression-evaluation, but evaluating an alternative representation of the expression rather than the result. Impose an ordering on commutative operators. Apply some obvious simplifications during the evaluation. Replace a rich operator set with a minimal set of primitive operators - e.g. using de-morgans to eliminate OR operators.
This alternative representation forms a canonical representation for all members of a set of equivalent expressions. It should be an equivalence class in the sense that you always find the same canonical form for any member of that set. But that's only the set-theory/abstract-algebra sense of an equivalence class - it doesn't mean that all equivalent expressions are in the same equivalence class.
For efficient dictionary lookups, you can use hashes or comparisons based on that canonical representation.
I'd definitely go with syntax normalization. That is, like aix suggested, transform the booleans using DNF and reorder the abstract syntax tree such that the lexically smallest arguments are on the left-hand side. Normalize all comparisons to < and <=. Then, two equivalent expressions should have equivalent syntax trees.

Programming in Python vs. programming in Java

I've been writing Java for the last couple of years , and now I've started to write in python (in addition).
The problem is that when I look at my Python code it looks like someone tried to hammer Java code into a python format , and it comes out crappy because - well , python ain't Java.
Any tips on how to escape this pattern of "Writing Java in Python"?
Thanks!
You might consider immersing yourself in the Python paradigms. The best way is to first know what they are then explore the best practices by reading some literature and reviewing some code samples. I recommend Learning Python by Mark Lutz; great for beginners and advanced users.
You'll do yourself a great injustice if you program with Python and fail to leverage all of the built-in, developer-friendly, Pythonic syntax.
As my French teacher used to say, "French isn't just English with different words."
If you are new to Python and coming from Java (or C#, or other similar statically typed OO language), these classic articles from PJ Eby and Ryan Tomayko are necessary reading:
Python Is Not Java (PJE)
Java is not Python, either (PJE)
Python Interfaces are not Java Interfaces (PJE)
The Static Method Thing (Tomayko)
Getters/Setters/Fuxors (Tomayko)
You could start by reading The Zen of Python. It'll give you some insight into how Python code is supposed to be written, provided you understand the language enough to understand what it's talking about. :-)
Some of the major ways in which Python differs from C/Java-like languages are:
List comprehensions.
Support for functional programming.
The use of certain Pythonic constructs instead of similar C-like constructs although both seem to work (list comprehensions can be argued to be a part of this, but there are others).
There are others, but these are the main ones that bugged me when I first started Python (and I had come from years of Java like you).
Before using any of these, it is helpful to understand why you should go for pythonic code rather than the usual C/Java way in Python, although both give you the same output.
For starters, Python provides some powerful features not available in C/Java that makes your code much clearer and simpler (although this is subjective, and might not look any better to someone coming from Java at first). The first two points fall into this category.
For example, support for functions as first class objects and closures makes it easy to do things that would need all kinds of weird acrobatics with inner classes in Java.
But a major reason is that Python is an interpreted language, and certain constructs are much faster than the equivalent C/Java-like code. For example, list comprehensions are usually a lot faster than an equivalent for-loop that iterates over the indices of a list and accesses each item by index. This is a very objective benefit, and IMHO a lot of the "Python in way too slow" way of thinking comes from using Java-style code shoe-horned into Python.
One of the best ways to learn about pythonic code is to read other people's code. I actually learnt a lot by looking at Python code posted in answers to SO questions. These often come with explanations and it is usually obvious why it is better than non-pythonic code (speed, clarity, etc.).
Edit:
Of course, there are other ways of getting other people's code. You can also download and look through the code of any good open source Python project. Books are also a good resource, I would recommend O'Reilly Python Cookbook. It has lots of useful code examples and very detailed explanations.
1) Python supports many (but not all) aspects of
object-oriented programming; but it is
possible to write a Python program without
making any use of OO concepts.
1) Java supports only object-oriented
programming.
2) Python is designed to be used interpretively.
A Python statement may be entered at the
interpreter prompt
(>>>)
, and will be executed
immediately. (Implementations make some
use of automatic compilation into bytecodes
(.pyc files).
2) Programs written in Java must be explicitly
compiled into bytecodes (.class files),
though an IDE may do this automatically in a
way that is transparent to the user. Java does
not support direct execution of statements -
though there are tools like Dr. Java that
support this.
3) Python is dynamically typed:
• A variable is introduced by assigning a
value to it. Example:
someVariable = 42
• A variable that has been assigned a value of
a given type may later be assigned a value
of a different type. Example:
someVariable = 42
someVariable = 'Hello, world'
3) Java is
statically typed
:
• A variable must be explicitly declared to be
of some type before assigning a value to it,
though declaration and assignment may be
done at the same time. Examples:
int someVariable;
int someVariable = 42;
• A variable that has been declared to be of a
particular type may not be assigned a value
of a different type.
4) Python supports the following built-in data
types:
Plain integers (normally 32-bit integers in
the range -2147483648 through
2147483647).
• Long integers (size limited only by memory
size of the machine running on)
• Booleans (False and True).
• Real numbers.
• Complex numbers.
In addition, Python supports a number of
types that represent a collection of values -
including strings, lists, and dictionaries.
4) Java has two kinds of data types: primitive
types and reference types. Java supports the
following primitive data types:
• byte - 8-bit integers
• short - 16-bit integers
• int - 32-bit integers
• long - 64-bit integers (Java also supports a
class java.math.BigInteger to represent
integers whose size is limited only by
memory)
• float - 32-bit real numbers.
• double - 32-bit real numbers.
• boolean - (false and true).
• char - a single character.
In addition, Java supports arrays of any type
as the reference types, and the API includes
the class String and a large number of classes
used for collections of values.
5)
Python is line-oriented:
statements end at the
end of a line unless the line break is explicitly
escaped with . There is no way to put more
than one statement on a single line.
Examples:
this is a statement
this is another statement
this is a long statement that extends over more \
than one line
5)
Statements in Java always end with a
semicolon (;)
. It is possible for a statement to
run over more than one line, or to have
multiple statements on a single line.
Examples:
this is a statement;
this is another statement;
this is a long statement that extends over more
than one line;
a statement; another; another;
6)
Python comments begin with #
and extend to
the end of the line. Example:
This is a comment
A new statement starts here
6) Java has two kinds of comments. A comment
beginning with // extend to the end of the
line (like Python comments). Comments can
also begin with /* and end with */. These
can extend over multiple lines or be
embedded within a single line. Examples:
// This is a comment
A new statement starts here
/* This is also a comment */
/* And this is also a comment, which is
long enough to require several lines
to say it. */
Statement starts /* comment */ then continues
7) Python strings can be enclosed in either single
or double quotes (' or ""). A character is
represented by a string of length 1. Examples:
'This is a string'
"This is also a string" # Equivalent
'c' # A string
"c" # An equivalent string
Python uses the following operators for
constructing compound boolean expressions:
and, or and not. Example:
not(x > 0 and y > 0) or z > 0
7) Java strings must be enclosed in double
quotes (""). A character is a different type of
object and is enclosed in single quotes (').
Examples:
"This is a String"
'c' // A character, but not a String
Java uses the following operators for
constructing compound boolean expressions:
&&, ||, ! and ^ (meaning exclusive or)
Example:
! (x > 0 && y > 0) || z > 0 ^ w > 0
8) In Python, the comparison operators
(>, <, >=, <=, == and !=) can be applied to numbers
,
strings, and other types of objects), and
compare values in some appropriate way (e.g.
numeric order, lexical order) where possible.
8) In Java, most of the comparison operators
( >, <, >=, and <=) can be applied only to
primitive types. Two (== and !=) can be
applied to any object, but when applied to
reference types they test for same (different)
object rather than same (different) value.
9) There is no universally-accepted Python
convention for naming classes, variables,
functions etc.
9) By convention, most names in Java use mixed
case. Class names begin with an uppercase
letter; variable and function names begin with
a lowercase letter. Class constants are named
using all uppercase letters with underscores.
Examples:
AClassName
aVariableName
aFunctionName()
A_CLASS_CONSTANT
10) Python definite looping statements have the
form for variable in expression: Example:
for p in pixels:
something
10) Java has two kinds of definite looping
statements. One has the form
for (variable in collection) Example:
for (p in pixels)
something;
11) Python uses the built-in function range() with
for to loop over a range of integers.
Examples:
for i in range(1, 10)
something
(i takes on values 1, 2, 3, 4, 5, 6, 7, 8, 9)
for i in range(1, 10, 2)
something
(i takes on values 1, 3, 5, 7, 9)
11) Java uses a different form of the for to loop
over a range of integers. Examples:
for (int i = 1; i < 10; i ++)
something;
(i takes on values 1, 2, 3, 4, 5, 6, 7, 8, 9)
for (int i = 1; i < 10; i += 2)
something;
(i takes on values 1, 3, 5, 7, 9)
12) Python conditional statements have the form
if condition: and an optional else part has the
form else:. The form elif condition: is
allowed as an alternative to an else:
immediately followed by an if. Examples:
if x < 0:
something
if x < 0:
something
else:
something different
if x < 0:
something
elif x > 0:
something different
else:
yet another thing
12) Java conditional statements have the form
if (condition) and an optional else part has
the form else (no colon) There is no elif
form - else if is used directly. Examples:
if (x < 0)
something;
if (x < 0)
something;
else
something different;
if (x < 0)
something;
else if (x > 0)
something different;
else
yet another thing;
13) The scope of a Python conditional or looping
statement is denoted by indentation. (If
multiple lines are to be included, care must be
used to be sure every line is indented
identically). Examples:
if x < 0:
do something
do another thing regardless of the value of x
if x < 0:
do something
do something else
do yet a third thing
do another thing regardless of the value of x
13) The scope of a Java conditional or looping
statement is normally just the next statement.
Indentation is ignored by the compiler
(though stylistically it is still highly desirable
for the benefit of a human reader). If
multiple lines are to be included, the scope
must be delimited by curly braces ({ , }).
(Optionally, these can be used even if the
scope is a single line.) Examples:
if (x < 0)
do something;
do another thing regardless of the value of x;
if (x < 0)
do something; // Bad style-don't do this!
do another thing regardless of the value of x;
if (x < 0)
{
do something;
do something else;
do yet a third thing;
}
do another thing regardless of the value of x;
if (x < 0)
{
do something;
}
do another thing regardless of the value of x;
If you want to see some fairly idiomatic Python that does non-trivial stuff, there's Dive Into Python, although Dive Into Python 3 is newer and might be a better source of style tips. If you're looking more for some points to review, there's Code Like a Pythonista.
You could post your code at Refactor my code to see if someone can show you a more Pythonic way to do it.
Definitely not a panacea but I think you should try some code golf in Python. Obviously nobody should write "golfed" code IRL, but finding the most terse way to express something really forces you to exploit the built in functionality of the language.
Someone provided me with this list of how "Python is not Java" when I started Python after Java, and it was very helpful.
Also, check out this similar SO question that I posted a short time ago when in a similar position.
Try to find algorithms that you understand well and see how they are implemented in python standard libraries.
Persist. :)
Learn a few other languages. It will help you make the difference between algorithms (the structure of processing, unchanged between languages) and the local syntaxic features of the language. Then you can "write Foo in Bar" for any combination of languages "Foo" and "Bar".
Eat Python, Sleep Python and Drink Python. That is the only way........
This is useful if you want to understand how to code to python in a more pythonic or correct way: http://www.python.org/dev/peps/pep-0008/

Categories

Resources