Why parsing Gremlin query in Java isn't generic?

Why parsing Gremlin query in Java isn't generic? - java

I'm parsing a Gremlin query in Java (well, actually I'm writing Scala, and using the Groovy compiled JARs like it was Java).
The query is a String variable that is given by user input. In other words - I cannot tell what the query will be, I'm only assuming it's a valid Gremlin query (syntactically and logically).
I started with a simple Gremlin.compile(query) that returns Pipe on which I'm iterating. However, according to the example, one must invoke .setStarts prior to iterating the Pipe. And I must know what the is the runtime type S in my Pipe<S,E>.
It feels like this API isn't generic enough, the following line from the example
pipe.setStarts(new SingleIterator<Vertex>(graph.getVertex(1)));
will work for some cases, but for Vertex Iteration for one example (g.V()) it will throw a CastException.
Is there a way to work-around it?
Perhaps using the underlying Script Engine (like the next examples in the link above) will help me to achieve more generic code?

I found a workaround. It feels a bit ugly but it does the job.
I'm using ScriptEngine with bindings of 'g' for the Graph, so the user can start his/her queries with g.. (not helps for generics, but makes it more user-friendly by not making the user use the Identity Pipe (_()) at the beginning of his/her queries).
(kind of ugly, I know) I'm extracting from the query string (using RegEx) the starting vertex (if exists), finding it programatically and (if found) invoking setStarts with it. If it's not found I'm giving the Graph itself as the parameter for setStarts, assuming its a Vertex Iteration query.

Related

Parsing non-fixed format binary payload with a custom javascript conversion in Vorto

We are using Vorto now mainly as a normalized format and are starting to look into using the mapping engine for mapping different payload formats to Vorto model as well. I more or less understand how to map functionblock properties from JSON or binary payload using xpath and the conversion functions. However, I'm not clear how to support parsing of non-fixed format binary payload using this method.
For instance we have an off the shelf LoRaWAN sensor which transmits in the following format:
<length><frame type>[<sensor-id><sensor-value>] where length is the total frame length and sensor-id (for eg temperature, humidity, battery, ...) describes how to parse the sensor-value (ie length, datatype). In one frame multiple of these readings may be present in random order.
Parsing this can be done easily in for instance loraserver.io using a small javascript function which iterates over all the bytes en returns the parsed properties. The same way will work in the Ditto payload mapping engine afaik.
However, currently I don't see how to do something similar in Vorto mapping. This is just one specific sensor example of course, but more examples exist on the market using similar dynamic payload format. I know there is already an open issue (#1535) to improve the documentation, but it would already be helpful to know if such flexible parsing would be possible using the mapping DSL.
I tried passing the raw payload as bytearray to the javascript function. In order to test this I duplicated the org.eclipse.vorto.mapping.engine.converter.binary.BinaryMappingTest#testMappingBinaryContaining2DataPoints and adapted the model to use a custom javascript function like this
evaluator.addScriptFunction(new ScriptClassFunction("extractTemperature",
"function extractTemperature(value) { " +
" print(\"parameter of type \" + typeof value + \", value = \" + value);" +
" print(value[1]);" +
"}"));
The output of this function is
parameter of type number, value = 1
undefined
Where the value 1 is the first element of the bytearray used.
So the function does not seem to receive the parameter as bytarray.
The model is configured with .withXPathStereotype("custom:extractTemperature(data)", "demo") so the payload is passed (as BinaryData) in the same way as in the testMappingBinaryContaining2DataPoints test (.withXPathStereotype("custom:convert(vorto_conversion1:byteArrayToInt(data,0,0,0,2))", "demo")). The only difference I see now is that in the testMappingBinaryContaining2DataPoints test is that the byetarray parameter is passed to a Java function instead of a javascript function. Or am I missing something?
Also, I noticed that loop keywords like for and while are not allowed in the javascript code. So even if I can access the bytearray parameter in the javascript function I see no way for now how to iterate over this.
On gitter I received following reply (together with the suggestion to move discussion to SO)
You are right. We restricted the Javascript function usage to very rudimentary set of language keywords excluding for loops as nasty stuff can be implemented there. What you could do Instead is to register a java function In your own namespace to the mapping engine. That function can hold a byte array. Later this function can be contributed to the mapping engine as a standard function to extract a certain value out for other developers to reuse.
I don't think this is solution to the problem however. As mentioned above this is just one example of an off the shelf sensor payload format, and I don't see how this can be generalized enough to include as a generic function in the mapping engine. And I don't think it should be required to implement a sensor specific conversion in Java, since (as an end-user of an IoT platform wanting to deploy a new sensor type) this is more complex to develop and deploy than a little javascript function which can be altered at runtime in the mapping spec. I see a lot of value in being able to do simple mappings in javascript, just like this can be done in for example loraserver.io and Eclipse Ditto.
I think being able to pass a byte array to javascript is a first step. Also I wonder where exactly the risk is in allowing loops in the javascript? For example Ditto also has some restrictions in the javascript sandbox (see here) but this allows loops and only prevents endless looping and recursion.
They state the following:
Using Rhino instead of Nashorn, the newer JavaScript engine shipped with Java, has the benefit that sandboxing can be applied in a better way.
Sandboxing of different payload scripts is required as Ditto is intended to be run as cloud service where multiple connections to different endpoints are managed for different tenants at the same time. This requires the isolation of each single script to avoid interference with other scripts and to protect the JVM executing the script against harmful code execution.
Would using Rhino in Vorto as well allow to control the risks you see and allow loop construct in Vorto mapping?
PS: can someone with enough SO reputation points add the tag eclipse-vorto please?

I created an issue for you request to support this in the Javascript converters: https://github.com/eclipse/vorto/issues/2029
As stated in the issue, as a current workaround, you can register your own custom converter function with Java and re-use this function across your mappings. In these java converter functions, you have all the power of the java language to convert to extract the right property from the arbitrary list.
In order to find out how to implement your own custom converter function with Java, take a look here: https://github.com/eclipse/vorto/tree/master/mapping-engine#Advanced-Usage

Since Eclipse Vorto 0.12.3 release, a fix for your request is available. With this it is possible to pass array object to javascript Converter as well as use for loops inside javascript functions. You might wanna give it a try.
See release notes https://github.com/eclipse/vorto/blob/master/docs/release-notes.md

How to retrieve all SQL-queries from Java source code?

We have many Java Spring projects which use Sybase database.
We want to migrate it to MSSQL.
One of the tasks is to develop a script to find all SQL-queries used in the projects' source code. Moreover, there is a brought usage of stored procedures in the projects.
What is an appropriate approach to do so?
#Override
public void update(int id, Entity entity) {
jdbcTemplate.update(
"UPDATE exclusion SET [enabled] = :enabled WHERE [id] = :id",
HashMapBuilder.<String, Object>builder()
.put("id", id)
.put("enabled", entity.enabled)
.build()
);
}
It is the easiest case.
Firstly, we want to REGEX the source code in order to find SQL by a list of SQL keywords.

In essence, you want to find any (SQL) string being fed to a jdbc call.
This means your tool must know what the jdbc methods are (e.g., "jdbcTemplate.update"), and which argument of each method is string intended to be SQL. That's sort of easy since it is documented.
What is hard is to find the string, because you assemble it dynamically; there's no guarantee that the entire SQL string is actually sitting as a direct argument to the function call. It might be computed by combining SQL string fragments using "+" and arbitrary function calls.
This means you have to parse the Java in a compiler sense, know what the meaning of each symbol is, and trace values through the dataflows in the code.
There's no way on earth a regex can do this reliably. (You can do it badly and maybe that's good enough for you, I suggest hunting for all jdbc method call names).
There's a worse problem: once you've figured out what the SQL string is, you know need to know if it is MSSQL-compliant. That requires parsing the abstract string (remember, it is assembled from a bunch of fragments) using an MSSQL-compliant parser (again, no regex can do context-free parsing) and complain about the ones that don't parse.
Even that may not be enough, if MSSQL has statements that look identical to sybase statements, but mean different things.
THis is a really hard problem to solve well using automation. (There are research papers that describe all of the above activities).
I think what you will have to do is find all SQL calls, and hand-inspect each for compatibility.
Next time, you should build your application with a database access layer. Then all the SQL calls are in one place.

How to build a SQL string, with values filled in, from a statement with named parameters?

I'm looking for a utility to prepare a SQL statement with named parameters and values, but not execute it. I just want the resulting SQL statement, with values substituted for named params, as a java.lang.String object. I did not find anything in Spring or Apache Commons. [I know how to enable debug logging for java.sql.*] Because I'm querying a db instance on a mainframe, prepared statements are not allowed; the support has been disabled, for some strange reason. That decision is beyond my control or influence. Do you know of a utility that can help me? I guess I could roll my own utility if I had to, but I'd rather not.

This is like saying you'd like to see your Java code with the user's input hardcoded into the source.
That would be ridiculous, because you know that the user's input is never combined with the Java source. It's combined (so to speak) with the compiled Java app at runtime. There is never a time when input data becomes merged into the source.
It's the same way with prepared SQL statements.
During prepare(), the RDBMS receives the textual SQL string, and parses it internally and retains a sort of "bytecode" version of the query. In this internal representation, the parameter placeholders are noted, and the query cannot execute before values are provided.
During execute(), the RDBMS receives parameter values, and combines them with the bytecode. The parameters never see the original SQL text! That's the way it is supposed to work.

First, you should know that one of the reasons for the prepared statement is security. The natural way of simply replacing placeholders with textual representation of parameters and then sending a simple string had been the cause of many SQL injection attacks. A classical example is
SELECT * FROM tab WHERE tabid = ?
with a parameter of 1; DELETE FROM tab and textual replacement of parameters, you transform a simple query in a delete all statement. Of course real attacks could be much more clever than that ...
It is really strange that in a mainframe database, one recommends plain SQL statements over prepared statements. In my experiences, security reasons leaded to the opposite rule. You should really ask for the reason of that, and what is the recommended approach. It could be be the usage of a special library, or a framework or ... but if you can, do avoid textual replacement.
Edit:
If your are really stuck with textual replacement, you will have to roll your own utility. As explained above, I cannot imagine the a framework does that. In a real word application, it can be made reasonably sure if you can validate all the inputs to avoid any possibility of SQL injection (no special characters or only at know places). But if I were at your place, I would not try to mimic SQL prepared statements, and I would simply use String.format where the format string will be the SQL query with placeholders in Formatter syntax.

How to script input for a Java program

I'm writing a Java program that requires its (technical) users to write scripts that it uses as input; it interprets these scripts into a series of actions and executes them. I am currently looking for the cleanest way to implement the script/configuration language. I was originally thinking of heading down the XML route, but the nature of the required input really is a procedural, linear flow of actions that need to be executed:
function move(Block b, Position p) {
// user-defined algorithm for moving block "b" to position "p"
}
Block a = getBlockA();
Position p = getPositionP();
move(a, p);
Etc. Please note: the above is an example only and does not constitute the exact syntax I am looking to achieve. I am still in the "30,000 ft view"-design phase, and don't know what my concreted scripting language will ultimately look like. I only provide this example to show that it is a flow/procedural script that the users must write, and that XML is probably not the best candidate for its implementation.
XML, perfect for hierarchial data, just doesn't feel like the best choice for such an implementation (although I could force it to work if need-be).
Not knowing a lick about DSLs, I've begun to read up on Groovy DSLs and they feel like a perfect match for what I need.
My uderstanding is that I could write, say, a Groovy (I'm stronger in Groovy than Scala, JRuby, etc.) DSL that would allow users to write scripts (.groovy files) that my program could then execute as input at runtime.
Is this correct, or am I misunderstanding the intent of DSLs altogether? If I am mistaken, does anybody have any suggestions for me? And if I am correct then how would a Java program read and execute a .groovy file (in other words, how would my program "consume" their script)?
Edit: I'm beginning to like ANTLR. Although I would love to roll up my sleeves and write a Groovy DSL, I don't want my users to be able to write any old Groovy program they want. I want my own "micro-language" and if users step outside of it I want the interpreter to invalidate the script. It's beginning to seem like Groovy/DSLs aren't the right choice, and maybe ANTLR could be the solution I need...?

I think you are on a really good path. Your users can write their files using your simple DSL and them you can run them by Evaling them at runtime. Your biggest challenge will be helping them to use the API of your DSL correctly. Unless they use an IDE this will be pretty tough.
Equivalent of eval() in Groovy

Yes, you can write a Groovy program that will accept a script as input and execute it. I recently wrote a BASIC DSL/interpreter in this way using groovy :
http://cartesianproduct.wordpress.com/binsic-is-not-sinclair-instruction-code/
(In the end it was more interpreter than DSL but that was to do with a peculiarity of Groovy that likely won't affect you - BASIC insists on UPPER CASE keywords which Groovy finds hard to parse - hence they have to be converted to lower case).
Groovy allows you to extend the script environment in various ways (eg injecting variables into the binding and transferring execution from the current script to a different, dynamically loaded script) which make this relatively simple.

How can I support the SQL GO statement in a Java / jtds application?

I'm working on a Java based OSS app SqlHawk which as one of its features is to run upgrade sql scripts against a server.
Microsoft have made it a convention to split a script into batches with the GO statement, which is a good idea but just asking for false matches on the string.
At the moment I have a very rudimentary:
// split where GO on its own on a line
Pattern batchSplitter = Pattern.compile("^GO", Pattern.MULTILINE);
...
String[] splitSql = batchSplitter.split(definition);
...
which kind of works but is prone to being tripped up by things like quoted GO statements or indentation issues.
I think the only way to make this truly reliable is to have an SQL parser in the app, but I have no idea how to go about this, or whether that might actually end up being less reliable (especially given this tool supports multiple DBMSs).
What ways could I solve this problem? Code examples would be very helpful to me here.
Relevant sqlHawk code on github.
Currently using jtds to execute the batches found in the scripts.

GO is a client batch seperator command. You can replace it with ;. It should not be sent in your EXEC dynamic SQL.
USE master
GO --<----- client actually send the first batch to SQL and wait for a response
SELECT * from sys.databases
GO
Should be translated in
Application.Exec("USE master");
Application.Exec("SELECT * from sys.databases");
or you can write it this way:
Application.Exec("'USE master;SELECT * from sys.databases")
More about GO
http://msdn.microsoft.com/en-us/library/ms188037(v=sql.90).aspx

Ok, so this isn't going to be exactly what you want, but you might find it a start. I released SchemaEngine (which forms the core of most of my products) as open source here. In there, you will find C# code that does what you want very reliably (i.e. not tripping up with strings, comments etc etc). It also support the 'GO x' syntax to repeat a batch x times.
If you download that and have a look in /Atlantis.SchemaEngine/Helpers you'll find a class called BatchParser.cs which contains a method called ParseBatches - which does pretty much what it says on the tin.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.