SPARQL Query Execution time using Jena on a Virtuoso service

SPARQL Query Execution time using Jena on a Virtuoso service - java

So I am using a Virtuoso SPARQL endpoint and I am using Jena to query it. I use QueryFactory and QueryExecution to create a SPARQL query :
Query query = QueryFactory.create(sparqlQueryString1);
QueryExecution qexec = QueryExecutionFactory.sparqlService("http://localhost:8890/sparql", query);
ResultSet results = qexec.execSelect();
Now I want to calculate the time taken to run this query. How does one find such a time using Jena on Virtuoso? Is that possible? Obviously I did look at functions like getTimeOut1() and getTimeOut2(). They don't seem to be giving me any good direction. As a hack I tried using Java's inbuilt System.currentTimeMillis(), However I am not sure if that is the right way. Any pointers as to how I can find execution time would be appreciated!

Results come back as a stream so the timing needs to span from just before qexec.execSelect() to just after the app finished handling the results, not just call execSelect.
Timer timer = new Timer() ;
timer.startTimer() ;
ResultSet results = qexec.execSelect();
ResultSetFormatter.consume(results) ;
long x = timer.finishTimer() ; // Time in milliseconds.

It's not clear whether you want to time the full round-trip, or just the time Virtuoso spends on things...
Virtuoso 7 lets you get the compilation (query plan) and execution time of a query using the profile function.
You can also enable general query logging and profiling using the prof_enable function.

Related

Influx db java client batch does not write to DB

I am trying to write points to influxDB using their Java client.
Batch is important to me.
If I use the influxDB.enableBatch with influxDB.write(Point) no data is inserted.
If I use the BatchPoints and influxDB.write(batchPoints) - data is inserted successfully.
Both code samples are taken from: https://github.com/influxdata/influxdb-java/tree/influxdb-java-2.7
InfluxDB influxDB = InfluxDBFactory.connect(influxUrl, influxUser, influxPassword);
influxDB.setDatabase(dbName);
influxDB.setRetentionPolicy("autogen");
// Flush every 2000 Points, at least every 100ms
influxDB.enableBatch(2000, 100, TimeUnit.MILLISECONDS);
influxDB.write(Point.measurement("cpu")
.time(System.currentTimeMillis(), TimeUnit.MILLISECONDS)
.addField("idle", 90L)
.addField("user", 9L)
.addField("system", 1L)
.build());
Query query = new Query("SELECT idle FROM cpu", dbName);
QueryResult result = influxDB.query(query);
Returns nothing.
BatchPoints batchPoints = BatchPoints.database(dbName).tag("async", "true").build();
Point point1 = Point
.measurement("cpu")
.tag("atag", "test")
.addField("idle", 90L)
.addField("usertime", 9L)
.addField("system", 1L)
.build();
batchPoints.point(point1);
influxDB.write(batchPoints);
Query query = new Query("SELECT * FROM cpu ", dbName);
QueryResult result = influxDB.query(query);
This returns data successfully.
As mentioned, I need the first way to function.
How can I achieve that?
versions:
influxdb-1.3.6
influxdb-java:2.7
Regards, Ido

maybe it's too late or you have already resolved your issue, but I will answer your question, it may be useful for others.
I think your first example is not working because you enabled batch functionality and it will "Flush every 2000 Points, at least every 100ms". So basically it's working, but you are making select before the actual save is performed.
When you use influxDB.enableBatch(...); functionality influxdb-client creates internal thread pool for storing your data after collecting them or by timeout and it will not be done immediately.
In second example when you use influxDB.write(batchPoints); influxdb-client is synchronously writing your data to InfluxDb. That's why your select statement is able to return data immediately.

Google cloud Big query UDF limitations

I am facing a problem in Google bigquery. I have some complex computation need to do and save the result in Bigquery. So we are doing that complex computation in Java and saving result in google bigquery with the help of Google cloud dataflow.
But this complex calculation is taking around 28 min to complete in java. Customer requirement is to do within 20 sec.
So we switch to Google bigquery UDF option. One option is Bigquery legacy UDF. Bigquery legacy UDF have limitation that it is processing row one by one so we phased out this option. As we need multiple rows to process the results.
Second option is Scalar UDF. Big query scalar UDF are only can be called from WEB UI or command line and can not be trigger from java client.
If any one have any idea the please provide the direction on the problem how to proceed.

You can use scalar UDFs with standard SQL from any client API, as long as the CREATE TEMPORARY FUNCTION statements are passed in the query attribute of the request. For example,
QueryRequest queryRequest =
QueryRequest
.newBuilder(
"CREATE TEMP FUNCTION GetWord() AS ('fire');\n"
+ "SELECT COUNT(DISTINCT corpus) as works_with_fire\n"
+ "FROM `bigquery-public-data.samples.shakespeare`\n"
+ "WHERE word = GetWord();")
// Use standard SQL syntax for queries.
// See: https://cloud.google.com/bigquery/sql-reference/
.setUseLegacySql(false)
.build();
QueryResponse response = bigquery.query(queryRequest);

Big query scalar UDF are only can be called from WEB UI or command
line and can not be trigger from java client.
This is not accurate. Standard SQL supports scalar UDFs through CREATE TEMPORARY FUNCTION statement which can be used from any application and any client - it is simply part of the SQL query:
https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
To learn how to enable Standard SQL, see this documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql
Particularly, simplest thing would be to add #standardSql tag at the beginning of SQL query.

How to get execution stats for a PL/SQL function?

I have a PL/SQL function that is called from our Java code.
I have the SQL_ID of the PL/SQL function execution and I have access to V$ views on my read-only DB user. The query takes quite some time to execute? Is there a way to profile the PL/SQL function execution to check where exactly the execution is stuck?
I know how to do this for SQL queries with V$SQL, V$ACTIVE_SESSION_HISTORY and V$SESSION_LONGOPS, but I am unable to figure out how to do this for PL/SQL code.
The PL/SQL function takes 4 minutes to execute, so I can execute quite a few V$ queries manually in that time. What V$ views should I check to find a line in the execution plan/function? Is this even possible?

Maybe you can use DBMS_PROFILER for your problem. But if you want to use this method, you have to install some infrastructure.
I don't want to describe the process how to install "proftab.sql", this link shows how it works.
It also shows some quick examples how to trace a specific function.
I can provide my version of the analyze-query after some testing:
select ppr.runid,
ppr.run_comment1,
decode(nvl(ppd.total_occur, 0), 0, 0, ppd.total_time / ppd.total_occur / 1000000) as Avg_msek,
ppd.total_time / 1000000 totaltime_msek,
ppd.total_occur,
uc.name,
uc.line,
uc.text as Source
from plsql_profiler_data ppd, plsql_profiler_units ppu, user_source uc, plsql_profiler_runs ppr
where ppd.runid = ppu.runid
and ppu.runid = ppr.runid
-- and ppr.run_comment1 = 'MYTEST' --show only a specific testrun
-- and ppr.runid = (select max(runid) from plsql_profiler_runs) /*to get the last run*/
and ppd.unit_number = ppu.unit_number
and ppu.unit_name = uc.name
and ppd.line#(+) = uc.line
and uc.type in ('PACKAGE BODY', 'TYPE BODY')
--order by uc.name, uc.line; --Show all code by line
--order by totaltime_msek desc; --Sort by slowest lines
order by total_occur desc, avg_msek desc --Sort by calls and slowest ones

Jena ARQ/TDB Query Optimization

I have a rather small graph containing roughly 500k triples. I've also generated the stats.opt file and running my code on a rather fast computer (quad core, 16gb ram, ssd drive.) But for the query I'm building with the help of the OP interface, it takes forever to iterate over the resultset. The resultset has about 15000 lines and the iteration takes 4s, which is unacceptable for endusers. Executing the query takes merely 90ms (I guess the real work is done by the cursor iteration?). Why is this so slow and what can I do to speed up the result set iteration?
Here is the query:
SELECT ?apartment ?price ?hasBalcony ?lat ?long ?label ?hasImage ?park ?supermarket ?rooms ?area ?street
WHERE
{ ?apartment dssd:hasBalcony ?hasBalcony .
?apartment wgs84:lat ?lat .
?apartment wgs84:long ?long .
?apartment rdfs:label ?label .
?apartment dssd:hasImage ?hasImage .
?apartment dssd:hasNearby ?hasNearbyPark .
?hasNearbyPark dssd:hasNearbyPark ?park .
?apartment dssd:hasNearby ?hasNearbySupermarket .
?hasNearbySupermarket dssd:hasNearbySupermarket ?supermarket .
?apartment dssd:price ?price .
?apartment dssd:rooms ?rooms .
?apartment dssd:area ?area .
?apartment vcard:hasAddress ?address .
?address vcard:streetAddress ?street
FILTER ( ?hasBalcony = true )
FILTER ( ?price <= 1000.0e0 )
FILTER ( ?price >= 650.0e0 )
FILTER ( ?rooms <= 4.0e0 )
FILTER ( ?rooms >= 3.0e0 )
FILTER ( ?area <= 100.0e0 )
FILTER ( ?area >= 60.0e0 )
}
(Is there a better way to query those bnodes: ?hasNearbyPark, ?hasNearbySupermarket)
And the code to execute the query:
dataset.begin(ReadWrite.READ);
Model model = dataset.getNamedModel("http://example.com");
QueryExecution queryExecution = QueryExecutionFactory.create(buildQuery(), model);
ResultSet resultSet = queryExecution.execSelect();
while ( resultSet.hasNext() ) {
QuerySolution solution = resultSet.next(); ...

On the ARQ Query Engine
First off you seem to be misunderstanding how the ARQ engine works:
ResultSet resultSet = queryExecution.execSelect();
All the above does is prepare a query plan for how the engine will evaluate the query, it does not actually evaluate the query hence why it is almost instantaneous.
Actual work on answering your question does not happen until you start calling hasNext() and next():
while ( resultSet.hasNext() ) {
QuerySolution solution = resultSet.next(); ...
So the timings you quote are incorrect, the query takes 4s to evaluate because that is how long it takes to iterate over all results.
On your actual question
You haven't shown what your buildQuery() method does but you say you are building the query as a Op structure programmatically rather than as a string? If this is the case then the query engine may not actually be applying optimization though off the top of my head I don't think this will be the issue. You can try adding an op = Algebra.optimize(op); before you return the built Op but I don't know that this will make much difference.
It looks like the optimizer should do a good job just given the raw query (not that your query has much scope for optimization other than join reordering) but if you are building it programmatically then you may be building an unusual algebra which the optimizer struggles with.
Similarly I'm not sure if you stats.opt file will be honored because you query over a specific model rather than the TDB dataset so the query engine might be the general purpose rather than the TDB engine. I'm not an expert in TDB so I can't tell if this is the case or not.
Bottom Line
In general there is not enough information in your question to diagnose if there is an actual issue in your setup or if your query is just plain expensive. Reporting this as a minimal test case (minimal complete code plus sample data) to the user#jena.apache.org list for further analysis would be useful.
As a general comment on your query lots of range filters are expensive to perform which is likely where most of the time goes.

Query All Google App Engine

Im trying to build a query that get all rows in datastore,but my problem is that i have hundreds of rows and when i try to run one time,i almost get limits quotas..
So my question is,what im doing wrong?
Query query = new Query(myObject);
PreparedQuery pq = datastore.prepare(query);
QueryResultList<Entity> results = pq.asQueryResultList(fetchOptions);
resp.setContentType("text/plain");
resp.getWriter().println(results.size());
for (Entity entity : results) {
resp.getWriter().println("entity.getProperty("name")");

What you have wrong is your algorithm. You cant do that with many rows since your frontend will timeout.
Look at task queues and backends.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.