Antlr AST Tree Approach To Complex Grammar - java

I have written a complex grammar. The grammar can be seen below:
grammar i;
options {
output=AST;
}
#header {
package com.data;
}
operatorLogic : 'AND' | 'OR';
value : STRING;
query : (select)*;
select : 'SELECT'^ functions 'FROM table' filters?';';
operator : '=' | '!=' | '<' | '>' | '<=' | '>=';
filters : 'WHERE'^ conditions;
conditions : (members (operatorLogic members)*);
members : STRING operator value;
functions : '*';
STRING : ('a'..'z'|'A'..'Z')+;
WS : (' '|'\t'|'\f'|'\n'|'\r')+ {skip();}; // handle white space between keywords
The output is done using AST. The above is only a small sample. However, I am developing some big grammar and need advice on how to approach this.
For example according to the above grammar the following can be produced:
SELECT * from table;
SELECT * from table WHERE name = i AND name = j;
This query could get more complex. I have implemented AST in the Java code and can get the Tree back. I wanted to seperate the grammar and logic, so their are cohesive. So AST was the best approach.
The user will enter a query as a String and my code needs to handle the query in the best way possible. As you can see the functions parser currently is * which means select all. In the future this could expand to include other things.
How can my code handle this? What's the best approach?
I could do something like this:
String input = "SELECT * from table;";
if(input.startsWith("SELECT")) {
select();
}
As you can see this approach is more complicated, as I need to handle * also the optional filters. The operatorLogic which is AND and OR, also needs to be done.
What is the best way? I have looked online, but couldn't find any example on how to handle this.
Are you able to give any examples?
EDIT:
String input = "SELECT * FROM table;";
if(input.startsWith("SELECT")) {
select();
}
else if(input.startsWith("SELECT *")) {
findAll();
}

The easiest way to handle multiple starting rules ("SELECT ...", "UPDATE...", etc) is to let the ANTLR grammar do the work for you at a single, top-level starting rule. You pretty much have that already, so it's just a matter of updating what you have.
Currently your grammar is limited to one command-type of input ("SELECT...") because that's all you've defined:
query : (select)*; //query only handles "select" because that's all there is.
select : 'SELECT'^ functions 'FROM table' filters?';';
If query is your starting rule, then accepting additional top-level input is a matter of defining query to accept more than select:
query : (select | update)*; //query now handles any number of "select" or "update" rules, in any order.
select : 'SELECT'^ functions 'FROM table' filters?';';
update : 'UPDATE'^ ';'; //simple example of an update rule
Now the query rule can handle input such as SELECT * FROM table;, UPDATE;, or SELECT * FROM table; UPDATE;. When a new top-level rule is added, just update query to test for that new rule. This way your Java code doesn't need to test the input, it just calls the query rule and lets the parser handle the rest.
If you only want one type of input to be processed from the input, define query like this:
query : select* //read any number of selects, but no updates
| update* //read any number of updates, but no selects
;
The rule query still handles SELECT * FROM table; and UPDATE;, but not a mix of commands, like SELECT * FROM table; UPDATE;.
Once you get your query_return AST tree from calling query, you now have something meaningful that your Java code can process, instead of a string. That tree represents all the input that the parser processed.
You can walk through the children of the tree like so:
iParser.query_return r = parser.query();
CommonTree t = (CommonTree) r.getTree();
for (int i = 0, count = t.getChildCount(); i < count; ++i) {
CommonTree child = (CommonTree) t.getChild(i);
System.out.println("child type: " + child.getType());
System.out.println("child text: " + child.getText());
System.out.println("------");
}
Walking through the entire AST tree is a matter of recursively calling getChild(...) on all parent nodes (my example above looks at the top-level children only).
Handling alternatives to * is no different than any other alternatives you've defined: just define the alternatives in the rule you want to expand. If you want functions to accept more than *, define functions to accept more than *. ;)
Here's an example:
functions: '*' //"all"
| STRING //some id
;
Now the parser can accept SELECT * FROM table; and SELECT foobar FROM table;.
Remember that your Java code has no reason to examine the input string. Whenever you're tempted to do that, look for a way to make your grammar do the examining instead. Your Java code will then look at the AST tree output for whatever it wants.

Related

Parse SQL Statement and reconstructing after modifications using java

Does anyone know how to parse SQL statements, and again build in back using Java? This is required because I would need to add extra columns to WHERE clause based on the some conditions. FOr example, based on the Logon user, I would need to decide whether the user is restricted to see the records like it is restricted outside USA.
use jsqlparser
examples:
CCJSqlParserManager ccjSqlParserManager = new CCJSqlParserManager();
Select statement = (Select) ccjSqlParserManager.parse(new FileReader(path));
PlainSelect plainSelect = (PlainSelect) statement.getSelectBody();
Expression expression = plainSelect.getWhere();
See below example
String sqlstr= "select * from [table name] where [ column 1]='value' or ? "
If( your condition){
sqlstr= sqlstr+" [ column 2]=' value 2'";
}
// Now write your execution statement

Postgresql Array Functions with QueryDSL

I use the Vlad Mihalcea's library in order to map SQL arrays (Postgresql in my case) to JPA. Then let's imagine I have an Entity, ex.
#TypeDefs(
{#TypeDef(name = "string-array", typeClass =
StringArrayType.class)}
)
#Entity
public class Entity {
#Type(type = "string-array")
#Column(columnDefinition = "text[]")
private String[] tags;
}
The appropriate SQL is:
CREATE TABLE entity (
tags text[]
);
Using QueryDSL I'd like to fetch rows which tags contains all the given ones. The raw SQL could be:
SELECT * FROM entity WHERE tags #> '{"someTag","anotherTag"}'::text[];
(taken from: https://www.postgresql.org/docs/9.1/static/functions-array.html)
Is it possible to do it with QueryDSL? Something like the code bellow ?
predicate.and(entity.tags.eqAll(<whatever>));
1st step is to generate proper sql: WHERE tags #> '{"someTag","anotherTag"}'::text[];
2nd step is described by coladict (thanks a lot!): figure out the functions which are called: #> is arraycontains and ::text[] is string_to_array
3rd step is to call them properly. After hours of debug I figured out that HQL doesn't treat functions as functions unless I added an expression sign (in my case: ...=true), so the final solution looks like this:
predicate.and(
Expressions.booleanTemplate("arraycontains({0}, string_to_array({1}, ',')) = true",
entity.tags,
tagsStr)
);
where tagsStr - is a String with values separated by ,
Since you can't use custom operators, you will have to use their functional equivalents. You can look them up in the psql console with \doS+. For \doS+ #> we get several results, but this is the one you want:
List of operators
Schema | Name | Left arg type | Right arg type | Result type | Function | Description
------------+------+---------------+----------------+-------------+---------------------+-------------
pg_catalog | #> | anyarray | anyarray | boolean | arraycontains | contains
It tells us the function used is called arraycontains, so now we look-up that function to see it's parameters using \df arraycontains
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+---------------+------------------+---------------------+--------
pg_catalog | arraycontains | boolean | anyarray, anyarray | normal
From here, we transform the target query you're aiming for into:
SELECT * FROM entity WHERE arraycontains(tags, '{"someTag","anotherTag"}'::text[]);
You should then be able to use the builder's function call to create this condition.
ParameterExpression<String[]> tags = cb.parameter(String[].class);
Expression<Boolean> tagcheck = cb.function("Flight_.id", Boolean.class, Entity_.tags, tags);
Though I use a different array solution (might publish soon), I believe it should work, unless there are bugs in the underlying implementation.
An alternative to method would be to compile the escaped string format of the array and pass it on as the second parameter. It's easier to print if you don't treat the double-quotes as optional. In that event, you have to replace String[] with String in the ParameterExpression row above
For EclipseLink I created a function
CREATE OR REPLACE FUNCTION check_array(array_val text[], string_comma character varying ) RETURNS bool AS $$
BEGIN
RETURN arraycontains(array_val, string_to_array(string_comma, ','));
END;
$$ LANGUAGE plpgsql;
As pointed out by Serhii, then you can useExpressions.booleanTemplate("FUNCTION('check_array', {0}, {1}) = true", entity.tags, tagsStr)

SPARQL ARQ Query Execution

So I have this piece of Jena code, which basically tries to build a query using a Triple ElementTriplesBlock and finally using the QueryFactory.make(). Now I have a local Virtuoso instance set up and so my SPARQL end point is the localhost. i.e. just http://localhost:8890/sparql. The RDFs that I am querying are generated from the Lehigh University Benchmark generator. NowI am trying to replace the triples in the query pattern based on some conditions. i.e. lets say if the query is made of two BGPs or triple patterns and if one of the triple patterns gives zero results, I'd want to change that triple pattern to something else. How do I achieve this in Jena? . My code looks like
//Create your triples
Triple pattern1 = Triple.create(Var.alloc("X"),Node.createURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type"),Node.createURI("http://swat.cse.lehigh.edu/onto/univ-bench.owl#AssociateProfessor"));
Triple pattern = Triple.create(Var.alloc("X"), Node.createURI("http://swat.cse.lehigh.edu/onto/univ-bench.owl#emailAddress"), Var.alloc("Y2"));
ElementTriplesBlock block = new ElementTriplesBlock();
block.addTriple(pattern1);
block.addTriple(pattern);
ElementGroup body = new ElementGroup();
body.addElement(block);
//Build a Query here
Query q = QueryFactory.make();
q.setPrefix("ub", "http://swat.cse.lehigh.edu/onto/univ-bench.owl#");
q.setQueryPattern(body);
q.setQuerySelectType();
q.addResultVar("X");
//?X ub:emailAddress ?Y2 .
//Query to String
System.out.println(q.toString());
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://localhost:8890/sparql", q);
Op op = Algebra.optimize(Algebra.compile(q));
System.out.println(op.toString());
So to be clear I am able to actually see the BGP in a Relational Algebra form by using the Op op = Algebra.optimize(Algebra.compile(q)) line. The output looks like
(project (?X)
(bgp
(triple ?X <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://swat.cse.lehigh.edu/onto/univ-bench.owl#AssociateProfessor>)
(triple ?X <http://swat.cse.lehigh.edu/onto/univ-bench.owl#emailAddress> ?Y2)
))
Now how would I go about evaluating the execution of each triple? In this case, if I just wanted to print the number of results at each step of the query pattern execution, how would I do it? I did read some of the examples here. I guess one has to use an OpExecutor and a QueryIterator but I am not sure how they all fit together. In this case I just would want to iterate through each of the basic graph patterns and then output the basic graph pattern and the number of results that it returns from the end point. Any help or pointers would be appreciated.

Mockrunner(Java) query with Regex

I am using Mockrunner to mock Sql DB for my unit tests. Following is my query:-
"select * from table where userId in (" + userIds + ")"
Now my userIds is state dependent. I don't need my test cases dependent on the arrangement inside the list - userIds. So I don't need exact match but regex matching. I have already enabled regex matching by below code:-
StatementResultSetHandler statementHandler = connection.getStatementResultSetHandler();
usersResult = statementHandler.createResultSet("users");
statementHandler.setUseRegularExpressions(true);
//How to write this regex query?
statementHandler.prepareResultSet("select * from table where userId in .*", campaignsResult);
But as it is noted, I have no idea about the regex syntax supported by Mockrunner.
Edit: I unable to match queries like "Select * from tables" with "Select * from tab .*". So It has to do something with the way I using regex with Mockrunner
There are some helpful examples available here. For instance:
public void testCorrectSQL() throws Exception {
MockResultSet result = getStatementResultSetHandler().createResultSet();
getStatementResultSetHandler().prepareResultSet("select.*isbn,.*quantity.*", result);
List orderList = new ArrayList();
orderList.add("1234567890");
orderList.add("1111111111");
Bookstore.order(getJDBCMockObjectFactory().getMockConnection(), orderList);
verifySQLStatementExecuted("select.*isbn,.*quantity.*\\(isbn='1234567890'.*or.*isbn='1111111111'\\)");
}
From this, I surmise that it's using standard Java regex syntax. In which case, you probably want:
prepareResultSet("select \\* from table where userId in \\(.*\\)", campaignsResult);
...or perhaps more succinctly (and depending upon exactly how fine-grained your tests need to be):
prepareResultSet("select .* from table where userId in .*", campaignsResult);
The main caveat to be aware of when enabling the regex matching is that any literal special characters that you want in your query (such as *, (, and ) literals) need to be escaped in your regex before it will work properly.

Parsing SQL commands

Basically I need to be able to parse a couple of SQL commands but I am not really sure of a good way.
Here is an example of the SQL commands
CREATE TABLE DEPARTMENT (
deptid INT CHECK(deptid > 0 AND deptid < 100),
dname CHAR(30),
location CHAR(10),
PRIMARY KEY(deptid)
);
INSERT INTO STUDENT VALUES (16711,'A.Smith',22,'A',20);
I am coding in java so would split be the best way? Or should I write my own parser? If so, can someone give me an example of how to parse it specifically Strings that are surrounded by ' ' but might contain a ' inside. Also for the CREATE TABLE I need to some how separate the CHECK parameters
If you are working in Java, I'd suggest looking into a Java-based parser generator with an available SQL grammar. ANTLR seems like a good choice for this.
If you're OK with using third parties, there are numerous parsers out there, including:
jOOQ, which has a parser
JsqlParser
General SQL Parser
You could parse your code like this:
Queries queries = ctx.parser().parse("CREATE TABLE ...");
And then access the expression tree, e.g. to find the check constraint
ctx.parser().parse("CREATE TABLE t (i int check (i > 0))")
.forEach(q -> {
if (q instanceof CreateTable ct) {
for (TableElement e : ct.$tableElements()) {
if (e instanceof Constraint c) {
println(c);
}
}
}
});
Note that as of jOOQ 3.17, the traversal is still experimental
Disclaimer: I work for the company behind jOOQ

Categories

Resources