Stardog custom aggregate function unavailable in Jena - java

I've created a custom aggregate function in Stardog that calculates the standard deviation. This works great when you post SPARQL queries to the endpoint or via the query panel in the admin console.
So far, so good, but we're facing a couple of problems. First, of all, when we execute a query like the following, it will execute perfectly via Stardog, but will fail in the SPARQL validator (and with the Jena API as well):
PREFIX : <http://our/namespace#>
PREFIX agg: <urn:aggregate:>
SELECT (agg:stardog:stdev(?age) AS ?stdLMD) (AVG(?age) AS ?avg)
WHERE {
?pat a :Person .
?pat :age ?age .
}
Stardog gives the correct results for standard deviation and average age, but the SPARQL validator throws an exception:
Non-group key variable in SELECT: ?age in expression (?age)
Does Stardog interpret the specification differently or is this a feature I'm unaware of?
Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev fuction. I guess because this extension is only Stardog related and unavailable for Jena? Let me show an example. ATM, we're executing CONSTRUCT queries via the following Jena code:
final Query dbQuery = QueryFactory.create(query.getContent());
final QueryExecution queryExec = QueryExecutionFactory.create(dbQuery, model);
queryExec.execConstruct(infModel);
As long as we're not using the aggregate function, this works like a charm. As we're constructing triples in multiple named graphs, it's very convenient to have a model available as well (which represents a named graph).
I would like to do something similar with the Stardog java API. I've only gotten as far as:
UpdateQuery dbQuery;
try {
dbQuery = connection.update(query.getContent());
dbQuery.execute();
} catch (final StardogException e) {
LOGGER.error("Cannot execute CONSTRUCT query", e);
}
Problem is that you explicitly need to specify which named graph you want to manipulate in the CONSTRUCT query. There's nothing like a Jena model that represents a part of the database so that we can avoid specifying it in the query. What would be a good approach here?
So my question is twofold: why are queries parsed differently in Stardog and is it possible to have Jena detect the custom Stardog aggregate functions? Thanks!
UPDATE
In the end, what we're trying to accomplish, is to execute a construct query over a given named graph, but write the newly constructed triples to a different graph. In my Jena example, you can see that I'm working with two Jena models to accomplish that. How would you do this with the SNARL API? I've gotten as for as the following code snippet, but this only defines the dataset this query will be executed against, not where the triples will be written to. Any help on this is still appreciated!
UpdateQuery dbQuery;
try {
dbQuery = connection.update(query.getContent());
final DatasetImpl ds = new DatasetImpl();
ds.addNamedGraph(new URIImpl(infDatasource));
dbQuery.dataset(ds);
dbQuery.execute();
} catch (final StardogException e) {
LOGGER.error("Cannot execute CONSTRUCT query", e);
}

The likely reason for the error
Non-group key variable in SELECT: ?age in expression (?age)
Is that the SPARQL validator, and ARQ, have no idea that agg:stardog:stdev is an aggregate and does not interpret it that way. The syntax is no different than a standard projection expression such as (?x + ?y as ?sum), as AndyS noted.
While the SPARQL spec doesn't quite preclude custom aggregates, they're not accounted for in the grammar itself. Both Stardog and Jena allow custom aggregates, albeit in different ways.
Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev fuction. I guess because this extension is only Stardog related and unavailable for Jena?
Yes, Jena and Stardog are distinct. Anything custom you've defined in Stardog, such as a custom aggregate, won't available directly in Jena.
You might be constructing the model in such a way that Jena, via ARQ, is the query engine as opposed to Stardog. That would explain why you get exceptions that Jena doesn't know about the custom aggregate you've defined within Stardog.
There's nothing like a Jena model that represents a part of the database so that we can avoid specifying it in the query. What would be a good approach here?
You can specify the active graph of a query programmatically via the SNARL API using dataset
So my question is twofold: why are queries parsed differently in Stardog and is it possible to have Jena detect the custom Stardog aggregate functions? Thanks!
They're parsed differently because there's no standard way of defining a custom aggregate and Stardog & Jena choose to implement it differently. Further, Jena would not be aware of Stardog's custom aggregates and vice versa.

Non-group key variable in SELECT: ?age in expression (?age)
Does Stardog interpret the specification differently or is this a
feature I'm unaware of?
I think that you're reading the spec correctly, and that maybe the validator just doesn't recognize non-built-in aggregates. The spec says:
19.8 Grammar
… Aggregate functions can be one of the built-in keywords for aggregates or a custom aggregate, which is syntactically a function call. Aggregate functions may only be used in SELECT, HAVING and ORDER BY clauses.
As to the construct query:
Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev function.
You didn't mention how you're using this. To use an aggregate within a construct pattern, you'd need to use a subquery. E.g., something like:
construct { ?s :hasStandardDeviation ?stddev }
where {{
select ?s (agg:stddev(?value) as ?stddev) {
?s :hasSampleValue ?value
}
group by ?s
}}
There are some examples of this in SPARQL functions in CONSTRUCT/WHERE. Of course, if the validator rejects the first, it probably rejects the second as well, but it looks like it should actually be legal. With Jena, you may need to make sure that you select a query language that allows extensions, but since the spec allows the custom functions (when identified by IRIs), I'd think you should be able to use the standard SPARQL 1.1 language. You are using SPARQL 1.1 and not the earlier SPARQL spec, right?

Unless a custom aggregate is installed, the parser does not know it's an aggregate. Apache Jena ARQ does not have custom aggregates by default.
An aggregate by URI looks like a plain custom function. So if you have not installed that aggregate, the parser considers it to be a custom function.
The AVG forces an implicit grouping so then the custom function is on a non-group key variable, which is illegal.

Related

A very specific usage of callbacks in Java

This question is about a specific usage of a callback pattern. By callback i mean an interface from which i can define method(s) that is (are) optionnaly (= with a default set to 'do nothing', thanks Java 8) called from a lower layer in my application. My "application" is in fact a product which may have a lot of changes between client projects, so i need to separates somethings in order to reuse what won't change (technical code, integration of technologies) from the rest (model, rules).
Let's take an example :
I developped a Search Service which is based upon Apache CXF JAX-RS Search.
This service parses a FIQL query which can only handle AND/OR condition with =/</&gt/LIKE/... condition to create a JPA criteria query. I can't use a a condition like 'isNull'.
Using a specific interface i can define a callback that will be called when i got the criteria query from apache CXF layer in my search service and add my condition to the existing ones before the query is executed. This condition are defined on the upper layer of my searchService (RestController). This is in order to reduce code duplicate, like retuning a criteria query and finalize it in every methods where i need it. And because using #Transactional in CXF JAX-RS controller does not work well Spring proxy and CXF work (some JAX-RS annotation are ignored);
First question : does this example seems to be a good idea in terms of design ?
Now another example : i have an object which have some basic fields created from a service layer. But i want to be able to set others non-nullable fields not related to the service's process before the entity is persisted. These fields may move from a projects to another so i'd like to not have to change the signature of my service's method every time we add / remove columns. So again i'm considering using a callback pattern to be able to set within the same transaction and before object is persisted by the Service layer.
Second question : What about this example ?
Global question : Except the classic usage of callback for events : is this a pratice to use this pattern for some specific usage or is there any better way to handle it ?
If you need some code sample ask me, i'll make some (can't post my current code).
I wouldn't say that what you've described is a very specific usage of "an interface from which i can define method(s) that is (are) optionally called from a lower layer". I think that it is reasonable and also quite common solution.
Your doubts may be due to the naming. I'd rather use the term command pattern here. It seems to me that it is less confusing. Your approach also resembles the strategy pattern i.e. you provide (inject) an object which performs some calculations. Depending, on the context you inject objects that behave in a different way (for example add different conditions to a query).
To sum up callbacks/commands are not only used for events. I'd even say that events are specific usage of them. Command/callback pattern is used whenever we need to encapsulate an operation within an object and transfer/pass it somehow (by the way, in Java there is no other way to do so but for example in C++ there are pointers to methods, in C# there are delegates...).
As to your second example. I'm not sure if I understand it correctly. Why can't you simply populate all required fields of an object before calling the service?

Dynamically build java/scala method body at runtime and execute it

Suppose I have the following Interface in java:
public interface DynamicMethod {
String doit();
}
I would like to build an Object during runtime which conforms to the above interface such that I inject doit method body in it and then execute it? Is this possible with Java Reflection API, or any other way? Or probably in some way in Scala?
Note that doit body for my objects would be dynamic and are not known a priori. You can assume that in run-time an array CodeArray[1..10] of Strings is provided and each entry of this array holds the code for each doit method. I would appreciate if you could answer with a sample code.
The context:
I try to explain the context of the problem; nonetheless, the above question still remains independent from the context.
I have some commands say C1,C2, ...; each command has certain parameters. Based on a command and its parameters the system needs to perform a certain task (which is expressible using a java code.) I need that these commands are stored for future execution based on user demand (so the CodeArray[1..10] in the above holds this list of java codes). For example, a user chooses a command from the list (i.e., from the array) and demands its execution.
My thought is that I build an engine that based on the user selection, loads the corresponding command code from the array and executes it.
With your context that you added, it sounds to me like you have an Interpreter..
For example, SQL takes input like "SELECT * FROM users", parses and builds a tree of tokens that it then interprets.
Another example: Java's regex is an interpreter. A string like "[abc]+" is compiled into tokens, and then interpreted when executed. You can see the tokens (called Nodes) it uses in the source code.
I'll try to post a simple example later, but the Interpreter Pattern doesn't use dynamically generated code. All of the tokens are concrete classes. You do have to define all possible (valid) user input so that you can make a token to execute it however. SQL and regex has a defined syntax, you will need one also.
I think Byte Buddy would be helpful in your case. It's an open source project maintained by a very well respected Java developer.
Take a look at the Learn section, they have a very detailed example there:
http://bytebuddy.net/#/tutorial
Currently it's not very clear what's your aim. There are many approaches to do this depending on your requirements.
In some cases it would be enough to create a Proxy and an InvocationHandler. Sometimes it's reasonable to generate Java source, then invoke JavaCompiler in runtime and load the generated class using URLClassLoader (probably that's your case if you're speaking about strings of code). Sometimes it's better to directly create a bytecode using libraries like ASM, cglib or BCEL.

JDBC-simulator for non-DB structures

Is there a framework to quickly build a JDBC-like interface for an internal data structure?
We have a complex internal data structure for which we have to generate reports. The data itself is persisted in a database but there is a large amount of Java code which manages the dependencies, access rights, data aggregation, etc. While it is theoretically possible to write all this code again in SQL, it would be much more simple if we could add a JDBC-like API to out application and point the reporting framework to that.
Especially since "JDBC" doesn't mean "SQL"; we could use something like commons jxpath to query our model or write our own simple query language.
[EDIT] What I'm looking for is a something that implements most of the necessary boiler plate code, so you can write:
// Get column names and types from "Foo" by reflection
ReflectionResultSet rs = new ReflectionResultSet( Foo.class );
List<Foo> results = ...;
rs.setData( results );
and ReflectionResultSet takes care of cursor management, all the getters, etc.
It sounds like JoSQL (SQL for Java Objects) is exactly what you want.
try googling "jdbe driver framework" The first (for me) looks like a fit for you: http://jxdbcon.sourceforge.net/
Another option that might work (also on the google results from the search above) is the Spring JDBC Templage. Here is a writeup http://www.zabada.com/tutorials/simplifying-jdbc-with-the-spring-jdbc-abstraction-framework.php
I think you'll have to create a new Driver implementation for your data structure. Usually, framework using JDBC have just to be provided an URL and a driver, so if you define your custom driver (and all the things that go with it, for example Connection), you'll be able to add the JDBC-API you want.

How can I create a class file dynamically?

I want to create a class file dynamically. Here it goes...
With the given ResultSet, extracting the metadata I want to build a class file dynamically with getter and setter methods for all the columns that exist in ResultSet. Also I should be able to use this class file generated where ever I want in my later use.
Can any body suggest me a better way to implement this. Also if any existing jar files available to implement this, that would be helpful.
Perhaps Apache Beanutils might suit your requirements?
See the section on Dynabeans
In particular:
3.3 ResultSetDynaClass (Wraps ResultSet in DynaBeans)
A very common use case for DynaBean APIs is to wrap other collections of "stuff" that do not normally present themselves as JavaBeans. One of the most common collections that would be nice to wrap is the java.sql.ResultSet that is returned when you ask a JDBC driver to perform a SQL SELECT statement. Commons BeanUtils offers a standard mechanism for making each row of the result set visible as a DynaBean, which you can utilize as shown in this example:
Connection conn = ...;
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery
("select account_id, name from customers");
Iterator rows = (new ResultSetDynaClass(rs)).iterator();
while (rows.hasNext()) {
DynaBean row = (DynaBean) rows.next();
System.out.println("Account number is " +
row.get("account_id") +
" and name is " + row.get("name"));
}
rs.close();
stmt.close();
3.4 RowSetDynaClass (Disconnected ResultSet as DynaBeans)
Although ResultSetDynaClass is a very useful technique for representing the results of an SQL query as a series of DynaBeans, an important problem is that the underlying ResultSet must remain open throughout the period of time that the rows are being processed by your application. This hinders the ability to use ResultSetDynaClass as a means of communicating information from the model layer to the view layer in a model-view-controller architecture such as that provided by the Struts Framework, because there is no easy mechanism to assure that the result set is finally closed (and the underlying Connection returned to its connection pool, if you are using one).
The RowSetDynaClass class represents a different approach to this problem. When you construct such an instance, the underlying data is copied into a set of in-memory DynaBeans that represent the result. The advantage of this technique, of course, is that you can immediately close the ResultSet (and the corresponding Statement), normally before you even process the actual data that was returned. The disadvantage, of course, is that you must pay the performance and memory costs of copying the result data, and the result data must fit entirely into available heap memory. For many environments (particularly in web applications), this tradeoff is usually quite beneficial.
As an additional benefit, the RowSetDynaClass class is defined to implement java.io.Serializable, so that it (and the DynaBeans that correspond to each row of the result) can be conveniently serialized and deserialized (as long as the underlying column values are also Serializable). Thus, RowSetDynaClass represents a very convenient way to transmit the results of an SQL query to a remote Java-based client application (such as an applet).
The thing is though - from the sounds of your situation, I understand that you want to create this class at runtime, based on the contents of a ResultSet that you just got back from a database query. This is all well and good, and can be done with bytecode manipulation.
However, what benefit do you perceive you will get from this? Your other code will not be able to call any methods on this class (because it did not exist when they were compiled), and consequently the only way to actually use this generated class would be either via reflection or via methods on its parent class or implemented interfaces (I'm going to assume it would extend ResultSet). You can do the latter without bytecode weaving (look at dynamic proxies for arbitrary runtime implementations of an interface), and if you're doing the former, I don't see how having a class and mechanically calling the getFoo method through reflection is better than just calling resultSet.getString("foo") - it will be slower, more clunky and less type-safe.
So - are you sure you really want to create a class to achieve your goal?
You might want to look at BCEL, although I believe there are other bytecode manipulation libraries available too.
If you're using Java 6 you can write your code and directly call the Java compiler:
Files[] files1 = ... ; // input for first compilation task
Files[] files2 = ... ; // input for second compilation task
JavaCompiler compiler = ToolProvider.getSystemJavaCompiler();
StandardJavaFileManager fileManager = compiler.getStandardFileManager(null, null, null);
Iterable<? extends JavaFileObject> compilationUnits1 =
fileManager.getJavaFileObjectsFromFiles(Arrays.asList(files1));
compiler.getTask(null, fileManager, null, null, null, compilationUnits1).call();
Iterable<? extends JavaFileObject> compilationUnits2 =
fileManager.getJavaFileObjects(files2); // use alternative method
// reuse the same file manager to allow caching of jar files
compiler.getTask(null, fileManager, null, null, null, compilationUnits2).call();
fileManager.close();
You will then have to load said class but you can do that easily enough with a class loader.
Sadly this is what you have to do in Java.
In C# you just use the 'var' type.
I'm confused to the way it's supposed to work. And i don't think it's possible.
Here's why:
If you want to use the class code in the rest of your application, you need an interface (or heavy use of reflection) and that would mean, you know the column types beforehand - defeating the purpose of a generated class.
A generated class might clash during runtime with another one.
If you create a new class for each SQL call, you will have either different classes for the same purpose. And these would probably not even pass a regular call to "equals".
You have to look up classes from previously executed statements. And you loose flexibility and/or fill your heap with classes.
I've done something probably similar. But I wouldn't create dynamic classes.
I had an object called Schema that would load the data of each table I'd need.
I had a Table object that would have a Schema type. Each Schema object would have columns attribute while While Table object had attribute with value and reference on Schema column attribute.
The Schema had everything you'd need to insert,select,delete,update data to the database.
And I had a mediator that would handle connection between the database and Table object.
Table t = new Table('Dog');
t.randomValue(); // needed for the purpose of my project
t.save();
Table u = Table.get(t);
u.delete();
But It could have something to get value on certain column name easily.
Anyway, the principle is easy, my could would load data contained in the table information_data it could probably work with a describe too.
I was able to load anytable dynamically as table had dynamic attributes the structure wasn't hardcoded. But there is no real need to create new classes for each table.
There was also something that could be important to note. Each table schema were loaded once. Tables only had reference to schemas and schemas had reference to column. column had references to column type etc...
It could have been interesting to find a better use than it had. I made that for unit case on database replication. I had no real interest to code a class for each of the 30 tables and do insert/delete/updates and selects. That's the only reason I can see it usefull to create something dynamic about sql. If you don't need to know anything about the tables and only want to insert/delete junk into it.
If I had to redo my code, I'd used more associative array.
Anyway Goodluck
I second the comments made by dtsazza and Stroboskop; generating a new class at run time is probably not what you want to do in this case.
You haven't really gotten into why you want to do this, but it sounds like you are trying to roll your own Object-Relational mapper. That is a problem that's much harder to get right than it first seems.
Instead of building your own system from the down up, you might want to look into existing solutions like Hibernate (high-level system, manages most of you objects and queries for you) or iBatis (a bit more low-level; it handles object mapping, but you still get to write your own SQL).
I have found that in JSF beans and maps can be used interchangably. Hence for handling results where you don't want to build a complete set of get/setters but just create a h:table, it is much easier to create a list with a map for each line, where the key is the column name (or number) and the value is the column content.
If you find it relevant later to make it more typesafe, you can then rework the backend code with beans, and keep your JSF-code unchanged.

simple jdbc wrapper

To implement data access code in our application we need some framework to wrap around jdbc (ORM is not our choice, because of scalability).
The coolest framework I used to work with is Spring-Jdbc. However, the policy of my company is to avoid external dependencies, especially spring, J2EE, etc.
So we are thinking about writing own handy-made jdbc framework, with functionality similar Spring-jdbc: row mapping, error handling, supporting features of java5, but without transaction support.
Does anyone have experience of writing such jdbc wrapper framework?
If anyone has experience of using other jdbc wrapper frameworks, please share your experience.
Thanks in advance.
We wrote our own wrapper. This topic is worthy of a paper but I doubt I'll ever have time to write it, so here are some key points:
we embraced sql and made no attempt to hide it. the only tweak was to add support for named parameters. parameters are important because we do not encourage the use of on-the-fly sql (for security reasons) and we always use PreparedStatements.
for connection management, we used Apache DBCP. This was convenient at the time but it's unclear how much of this is needed with modern JDBC implementations (the docs on this stuff is lacking). DBCP also pools PreparedStatements.
we didn't bother with row mapping. instead (for queries) we used something similar to the Apache dbutil's ResultSetHandler, which allows you to "feed" the result set into a method which can then dump the information wherever you'd like it. This is more flexible, and in fact it wouldn't be hard to implement a ResultSetHandler for row mapping. for inserts/updates we created a generic record class (basically a hashmap with some extra bells and whistles). the biggest problem with row mapping (for us) is that you're stuck as soon as you do an "interesting" query because you may have fields that map to different classes; because you may have a hierarchical class structure but a flat result set; or because the mapping is complex and data dependent.
we built in error logging. for exception handling: on a query we trap and log, but for an update we trap, log, and rethrow an unchecked exceptions.
we provided transaction support using a wrapper approach. the caller provides the code that performs transaction, and we make sure that the transaction is properly managed, with no chance of forgetting to finish the transaction and with rollback and error handling built-in.
later on, we added a very simplistic relationship scheme that allows a single update/insert to apply to a record and all its dependencies. to keep things simple, we did not use this on queries, and we specifically decided not to support this with deletes because it is more reliable to use cascaded deletes.
This wrapper has been successfully used in two projects to date. It is, of course, lightweight, but these days everyone says their code is lightweight. More importantly, it increases programmer productivity, decreases the number of bugs (and makes problems easier to track down), and it's relatively easy to trace through if need be because we don't believe in adding lots of layers just to provide beautiful architecture.
Spring-JDBC is fantastic. Consider that for an open source project like Spring the down side of external dependency is minimized. You can adopt the most stable version of Spring that satisfies your JDBC abstraction requirements and you know that you'll always be able to modify the source code yourselves if you ever run into an issue -- without depending on an external party. You can also examine the implementation for any security concerns that your organization might have with code written by an external party.
The one I prefer: Dalesbred. It's MIT licensed.
A simple example of getting all rows for a custom class (Department).
List<Department> departments = db.findAll(Department.class,
"select id, name from department");
when the custom class is defined as:
public final class Department {
private final int id;
private final String name;
public Department(int id, String name) {
this.id = id;
this.name = name;
}
}
Disclaimer: it's by a company I work for.
This sounds like a very short sighted decision. Consider the cost of developing/maintaining such a framework, especially when you can get it, and it's source code for free. Not only do you not have to do the development yourself, you can modify it at will if need be.
That being said, what you really need to duplicate is the notion of JdbcTemplate and it's callbacks (PreparedStatementCreator, PreparedStatementCallback), as well and RowMapper/RowCallbackHandler. It shouldn't be overcomplicated to write something like this (especially considering you don't have to do transaction management).
Howver, as i've said, why write it when you can get it for free and modify the source code as you see fit?
Try JdbcSession from jcabi-jdbc. It's as simple as JDBC should be, for example:
String name = new JdbcSession(source)
.sql("SELECT name FROM foo WHERE id = ?")
.set(123)
.select(new SingleOutcome<String>(String.class));
That's it.
Try mine library as alternative:
<dependency>
<groupId>com.github.buckelieg</groupId>
<artifactId>jdbc-fn</artifactId>
<version>0.2</version>
</dependency>
More info here
Jedoo
There is a wrapper class called Jedoo out there that uses database connection pooling and a singleton pattern to access it as a shared variable. It has plenty of functions to run queries fast.
Usage
To use it you should add it to your project and load its singleton in a java class:
import static com.pwwiur.util.database.Jedoo.database;
And using it is pretty easy as well:
if(database.count("users") < 100) {
long id = database.insert("users", new Object[][]{
{"name", "Amir"},
{"username", "amirfo"}
});
database.setString("users", "name", "Amir Forsati", id);
try(ResultSetHandler rsh = database.all("users")) {
while(rsh.next()) {
System.out.println("User ID:" + rsh.getLong("id"));
System.out.println("User Name:" + rsh.getString("name"));
}
}
}
There are also some useful functions that you can find in the documentation linked above.
mJDBC: https://mjdbc.github.io/
I use it for years and found it very useful (I'm the author of this library).
It is inspired by JDBI library but has no dependencies, adds transactions support, provides performance counters and allows to switch to the lowest possible SQL level in Java (old plain JDBC API) easily in case if you really need it.

Categories

Resources