How to get all of the subjects of a Jena Query? - java

Suppose I have some jena query object :
String query = "SELECT * WHERE{ ?s <some_uri> ?o ...etc. }";
Query q = QueryFactory.create(query, Syntax.syntaxARQ);
What would be the best way to get all of the subjects of the triples in the query? Preferably without having to do any string parsing/manipulation manually.
For example, given a query
SELECT * WHERE {
?s ?p ?o;
?p2 ?o2.
?s2 ?p3 ?o3.
?s3 ?p4 ?o4.
<http://example.com> ?p5 ?o5.
}
I would hope to have returned some list which looks like
[?s, ?s2, ?s3, <http://example.com>]
In other words, I want the list of all subjects in a query. Even having only those subjects which were variables or those which were literals/uris would be useful, but I'd like to find a list of all of the subjects in the query.
I know there are methods to return the result variables (Query.getResultVars) and some other information (see http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/query/Query.html), but I can't seem to find anything which will get specifically the subjects of the query (a list of all result variables would return the predicates and objects as well).
Any help appreciated.

Interesting question. What you need to do is go through the query, and for each block of triples iterate through and look at the first part.
The most robust way to do this is via an element walker which will go through each part of the query. It might seem over the top in your case, but queries can contain all sorts of things, including FILTERs, OPTIONALs, and nested SELECTs. Using the walker means that you can ignore that stuff and focus on only what you want:
Query q = QueryFactory.create(query); // SPARQL 1.1
// Remember distinct subjects in this
final Set<Node> subjects = new HashSet<Node>();
// This will walk through all parts of the query
ElementWalker.walk(q.getQueryPattern(),
// For each element...
new ElementVisitorBase() {
// ...when it's a block of triples...
public void visit(ElementPathBlock el) {
// ...go through all the triples...
Iterator<TriplePath> triples = el.patternElts();
while (triples.hasNext()) {
// ...and grab the subject
subjects.add(triples.next().getSubject());
}
}
}
);

It might be too late but another way is to make use of Jena ARQ libraries and create Algebra of the given query. Once the algebra is created, it can be compiled and you can traverse through all the triples (given in the where clause). Here is the code, I hope it helps:
Query query = qExec.getQuery(); //qExec is an object of QueryExecutionFactory
// Generate algebra of the query
Op op = Algebra.compile(query);
CustomOpVisitorBase opVisitorBase = new CustomOpVisitorBase();
opVisitorBase.opVisitorWalker(op);
List<Triple> queryTriples = opVisitorBase.triples;
CustomOpVisitor class is given below:
public class CustomOpVisitorBase extends OpVisitorBase {
List<Triple> triples = null;
void opVisitorWalker(Op op) {
OpWalker.walk(op, this);
}
#Override
public void visit(final OpBGP opBGP) {
triples = opBGP.getPattern().getList();
}
}
Traverse through the list of Triples and make use of given property functions such as triple.getSubject() etc etc.

Related

Lucene 6 - How to influence ranking with numeric value?

I am new to Lucene, so apologies for any unclear wording. I am working on an author search engine. The search query is the author name. The default search results are good - they return the names that match the most. However, we want to rank the results by author popularity as well, a blend of both the default similarity and a numeric value representing the circulations their titles have. The problem with the default results is it returns authors nobody is interested in, and while I can rank by circulation alone, the top result is generally not a great match in terms of name. I have been looking for days for a solution for this.
This is how I am building my index:
IndexWriter writer = new IndexWriter(FSDirectory.open(Paths.get(INDEX_LOCATION)),
new IndexWriterConfig(new StandardAnalyzer()));
writer.deleteAll();
for (Contributor contributor : contributors) {
Document doc = new Document();
doc.add(new TextField("name", contributor.getName(), Field.Store.YES));
doc.add(new StoredField("contribId", contributor.getContribId()));
doc.add(new NumericDocValuesField("sum", sum));
writer.addDocument(doc);
}
writer.close();
The name is the field we want to search on, and the sum is the field we want to weight our search results with (but still taking into account the best match for the author name). I'm not sure if adding the sum to the document is the correct thing to do in this situation. I know that there will need to be some experimentation to figure out how to best blend the weighting of the two factors, but my problem is I don't know how to do it in the first place.
Any examples I've been able to find are either pre-Lucene 4 or don't seem to work. I thought this was what I was looking for, but it doesn't seem to work. Help appreciated!
As demonstrated in the blog post you linked, you could use a CustomScoreQuery; this would give you a lot of flexibility and influence over the scoring process, but it is also a bit overkill. Another possibility is to use a FunctionScoreQuery; since they behave differently, I will explain both.
Using a FunctionScoreQuery
A FunctionScoreQuery can modify a score based on a field.
Let's say you create you are usually performing a search like this:
Query q = .... // pass the user input to the QueryParser or similar
TopDocs hits = searcher.search(query, 10); // Get 10 results
Then you can modify the query in between like this:
Query q = .....
// Note that a Float field would work better.
DoubleValuesSource boostByField = DoubleValuesSource.fromLongField("sum");
// Create a query, based on the old query and the boost
FunctionScoreQuery modifiedQuery = new FunctionScoreQuery(q, boostByField);
// Search as usual
TopDocs hits = searcher.search(query, 10);
This will modify the query based on the value of field. Sadly, however, there isn't a possibility to control the influence of the DoubleValuesSource (besides by scaling the values during indexing) - at least none that I know of.
To have more control, consider using the CustomScoreQuery.
Using a CustomScoreQuery
Using this kind of query will allow you to modify a score of each result any way you like. In this context we will use it to alter the score based on a field in the index. First, you will have to store your value during indexing:
doc.add(new StoredField("sum", sum));
Then we will have to create our very own query class:
private static class MyScoreQuery extends CustomScoreQuery {
public MyScoreQuery(Query subQuery) {
super(subQuery);
}
// The CustomScoreProvider is what actually alters the score
private class MyScoreProvider extends CustomScoreProvider {
private LeafReader reader;
private Set<String> fieldsToLoad;
public MyScoreProvider(LeafReaderContext context) {
super(context);
reader = context.reader();
// We create a HashSet which contains the name of the field
// which we need. This allows us to retrieve the document
// with only this field loaded, which is a lot faster.
fieldsToLoad = new HashSet<>();
fieldsToLoad.add("sum");
}
#Override
public float customScore(int doc_id, float currentScore, float valSrcScore) throws IOException {
// Get the result document from the index
Document doc = reader.document(doc_id, fieldsToLoad);
// Get boost value from index
IndexableField field = doc.getField("sum");
Number number = field.numericValue();
// This is just an example on how to alter the current score
// based on the value of "sum". You will have to experiment
// here.
float influence = 0.01f;
float boost = number.floatValue() * influence;
// Return the new score for this result, based on the
// original lucene score.
return currentScore + boost;
}
}
// Make sure that our CustomScoreProvider is being used.
#Override
public CustomScoreProvider getCustomScoreProvider(LeafReaderContext context) {
return new MyScoreProvider(context);
}
}
Now you can use your new Query class to modify an existing query, similar to the FunctionScoreQuery:
Query q = .....
// Create a query, based on the old query and the boost
MyScoreQuery modifiedQuery = new MyScoreQuery(q);
// Search as usual
TopDocs hits = searcher.search(query, 10);
Final remarks
Using a CustomScoreQuery, you can influence the scoring process in all kinds of ways. Remember however that the method customScore is called for each search result - so don't perform any expensive computations there, as this would severely slow down the search process.
I've creating a small gist of a full working example of the CustomScoreQuery here: https://gist.github.com/philippludwig/14e0d9b527a6522511ae79823adef73a

Hibernate row count on Criteria with already set Projection

For a grid component I have in my web applications I have a "GridModel" class which gets passed a Criteria.
The GridModel class has a method to get the results for a specific page by adding setFirstResult(...) and setMaxResults(...) to the Criteria.
But I also need the total count of rows for the Criteria, so I have the following method:
public int getAvailableRows() {
Criteria c = criteriaProvider.getCriteria();
c.setProjection(Projections.rowCount());
return((Long)c.uniqueResult()).intValue();
}
This worked perfectly, but now I have a grid that requires a Criteria that already uses setProjection() in combination with setResultTransformer(). It seems that the getAvailableRows() method above overrides the setProjection() of the original Criteria creating wrong results.
Can I wrap a count Criteria around the original Criteria instead somehow? Or how would I solve this?
I've had a similar experience when trying to use the Projections.rowCount() in conjunction with a groupBy expression. I was able to circumvent things in a slightly 'hacky' manner by:
Remembering the previous projection and result transformer
Setting the projection on the Criteria to be a modified version (see below)
Perform the row count DB hit
Restore the previous projection + transformer so the Criteria can be used for actual result retrieving if
final Projection originalProjection = criteriaImpl.getProjection();
final ResultTransformer originalResultTransformer =
criteriaImpl.getResultTransformer();
final Projection rowCountProjection;
// If we identify that we have a function with a group by clause
// we need to handle it in a special fashion
if ( originalProjection != null && originalProjection.isGrouped() )
{
final CriteriaQueryTranslator criteriaQueryTranslator =
new CriteriaQueryTranslator(
(SessionFactoryImplementor)mySessionFactory,
criteriaImpl,
criteriaImpl.getEntityOrClassName(),
CriteriaQueryTranslator.ROOT_SQL_ALIAS );
rowCountProjection = Projections.projectionList()
.add( Projections.rowCount() )
.add( Projections.sqlGroupProjection(
// This looks stupid but is seemingly required to ensure we have a valid query
"count(count(1))",
criteriaQueryTranslator.getGroupBy(),
new String[]{}, new Type[]{} ) );
}
else
{
rowCountProjection = Projections.rowCount();
}
// Get total count of elements by setting a count projection
final Long rowCount =
(Long)criteria.setProjection( rowCountProjection ).uniqueResult();
A few caveats here:
This still wont give the expected results if you try and give it a criteria with a single sum projection as that is not considered an isGrouped() projection - it will splat the sum with a count. I don't consider this an issue because getting the rowcount for an expression of that nature probably doesnt make sense
When I was dealing with this I wrote some unit tests to make sure rowcount was as expected without projections, with property based projections and with groupby projections but I've written this from memory so can't guarantee small kinks won't need ironing out

Is there a way to get the count size for a JPA Named Query with a result set?

I like the idea of Named Queries in JPA for static queries I'm going to do, but I often want to get the count result for the query as well as a result list from some subset of the query. I'd rather not write two nearly identical NamedQueries. Ideally, what I'd like to have is something like:
#NamedQuery(name = "getAccounts", query = "SELECT a FROM Account")
.
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = q.getCount();
So let's say m is 10, s is 0 and there are 400 rows in Account. I would expect r to have a list of 10 items in it, but I'd want to know there are 400 rows total. I could write a second #NamedQuery:
#NamedQuery(name = "getAccountCount", query = "SELECT COUNT(a) FROM Account")
but it seems a DRY violation to do that if I'm always just going to want the count. In this simple case it is easy to keep the two in sync, but if the query changes, it seems less than ideal that I have to update both #NamedQueries to keep the values in line.
A common use case here would be fetching some subset of the items, but needing some way of indicating total count ("Displaying 1-10 of 400").
So the solution I ended up using was to create two #NamedQuerys, one for the result set and one for the count, but capturing the base query in a static string to maintain DRY and ensure that both queries remain consistent. So for the above, I'd have something like:
#NamedQuery(name = "getAccounts", query = "SELECT a" + accountQuery)
#NamedQuery(name = "getAccounts.count", query = "SELECT COUNT(a)" + accountQuery)
.
static final String accountQuery = " FROM Account";
.
Query q = em.createNamedQuery("getAccounts");
List r = q.setFirstResult(s).setMaxResults(m).getResultList();
int count = ((Long)em.createNamedQuery("getAccounts.count").getSingleResult()).intValue();
Obviously, with this example, the query body is trivial and this is overkill. But with much more complex queries, you end up with a single definition of the query body and can ensure you have the two queries in sync. You also get the advantage that the queries are precompiled and at least with Eclipselink, you get validation at startup time instead of when you call the query.
By doing consistent naming between the two queries, it is possible to wrap the body of the code to run both sets just by basing the base name of the query.
Using setFirstResult/setMaxResults do not return a subset of a result set, the query hasn't even been run when you call these methods, they affect the generated SELECT query that will be executed when calling getResultList. If you want to get the total records count, you'll have to SELECT COUNT your entities in a separate query (typically before to paginate).
For a complete example, check out Pagination of Data Sets in a Sample Application using JSF, Catalog Facade Stateless Session, and Java Persistence APIs.
oh well you can use introspection to get named queries annotations like:
String getNamedQueryCode(Class<? extends Object> clazz, String namedQueryKey) {
NamedQueries namedQueriesAnnotation = clazz.getAnnotation(NamedQueries.class);
NamedQuery[] namedQueryAnnotations = namedQueriesAnnotation.value();
String code = null;
for (NamedQuery namedQuery : namedQueryAnnotations) {
if (namedQuery.name().equals(namedQueryKey)) {
code = namedQuery.query();
break;
}
}
if (code == null) {
if (clazz.getSuperclass().getAnnotation(MappedSuperclass.class) != null) {
code = getNamedQueryCode(clazz.getSuperclass(), namedQueryKey);
}
}
//if not found
return code;
}

lucene ignore queries on fields other than default

i have 2 indexes, one for meta data and one for text, i want to be able to remove all field searches in the query and only use the default fields that the user searched, ie "help AND title:carpool" i want only the help part, ideas?
Traverse over tree of BooleanQuery and remove entries related Term("help")
This is a ballpark of what your code should look like:
public static void removeNonDefault(BooleanQuery query, String defaultField) {
List<BooleanClause> clauses = (List<BooleanClause>)query.clauses();
Iterator<BooleanClause> iter = clauses.iterator();
while(iter.hasNext()) {
BooleanClause clause = iter.next();
Query subQuery = clause.getQuery();
if(subQuery instanceof BooleanQuery) {
removeNonDefault((BooleanQuery)subQuery, defaultField);
} else if(subQuery instanceof TermQuery) {
if (!((TermQuery) subQuery).getTerm().field().equals(defaultField)) {
iter.remove();
}
}
}
}
What this does is removes TermQuerys with the non-default field from the BooleanQuery, and recurses down into sub-boolean queries.
Note that this code is not complete. Depending on your situation, there might be more types of queries you should worry about, like phrase queries and constant score range queries.
Make sure to do query.rewrite() before you call this function, to convert any wildcard queries to boolean queries.

How to get distinct results in hibernate with joins and row-based limiting (paging)?

I'm trying to implement paging using row-based limiting (for example: setFirstResult(5) and setMaxResults(10)) on a Hibernate Criteria query that has joins to other tables.
Understandably, data is getting cut off randomly; and the reason for that is explained here.
As a solution, the page suggests using a "second sql select" instead of a join.
How can I convert my existing criteria query (which has joins using createAlias()) to use a nested select instead?
You can achieve the desired result by requesting a list of distinct ids instead of a list of distinct hydrated objects.
Simply add this to your criteria:
criteria.setProjection(Projections.distinct(Projections.property("id")));
Now you'll get the correct number of results according to your row-based limiting. The reason this works is because the projection will perform the distinctness check as part of the sql query, instead of what a ResultTransformer does which is to filter the results for distinctness after the sql query has been performed.
Worth noting is that instead of getting a list of objects, you will now get a list of ids, which you can use to hydrate objects from hibernate later.
I am using this one with my code.
Simply add this to your criteria:
criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
that code will be like the select distinct * from table of the native sql.
A slight improvement building on FishBoy's suggestion.
It is possible to do this kind of query in one hit, rather than in two separate stages. i.e. the single query below will page distinct results correctly, and also return entities instead of just IDs.
Simply use a DetachedCriteria with an id projection as a subquery, and then add paging values on the main Criteria object.
It will look something like this:
DetachedCriteria idsOnlyCriteria = DetachedCriteria.forClass(MyClass.class);
//add other joins and query params here
idsOnlyCriteria.setProjection(Projections.distinct(Projections.id()));
Criteria criteria = getSession().createCriteria(myClass);
criteria.add(Subqueries.propertyIn("id", idsOnlyCriteria));
criteria.setFirstResult(0).setMaxResults(50);
return criteria.list();
A small improvement to #FishBoy's suggestion is to use the id projection, so you don't have to hard-code the identifier property name.
criteria.setProjection(Projections.distinct(Projections.id()));
The solution:
criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
works very well.
session = (Session) getEntityManager().getDelegate();
Criteria criteria = session.createCriteria(ComputedProdDaily.class);
ProjectionList projList = Projections.projectionList();
projList.add(Projections.property("user.id"), "userid");
projList.add(Projections.property("loanState"), "state");
criteria.setProjection(Projections.distinct(projList));
criteria.add(Restrictions.isNotNull("this.loanState"));
criteria.setResultTransformer(Transformers.aliasToBean(UserStateTransformer.class));
This helped me :D
if you want to use ORDER BY, just add:
criteria.setProjection(
Projections.distinct(
Projections.projectionList()
.add(Projections.id())
.add(Projections.property("the property that you want to ordered by"))
)
);
I will now explain a different solution, where you can use the normal query and pagination method without having the problem of possibly duplicates or suppressed items.
This Solution has the advance that it is:
faster than the PK id solution mentioned in this article
preserves the Ordering and don’t use the 'in clause' on a possibly large Dataset of PK’s
The complete Article can be found on my blog
Hibernate gives the possibility to define the association fetching method not only at design time but also at runtime by a query execution. So we use this aproach in conjunction with a simple relfection stuff and can also automate the process of changing the query property fetching algorithm only for collection properties.
First we create a method which resolves all collection properties from the Entity Class:
public static List<String> resolveCollectionProperties(Class<?> type) {
List<String> ret = new ArrayList<String>();
try {
BeanInfo beanInfo = Introspector.getBeanInfo(type);
for (PropertyDescriptor pd : beanInfo.getPropertyDescriptors()) {
if (Collection.class.isAssignableFrom(pd.getPropertyType()))
ret.add(pd.getName());
}
} catch (IntrospectionException e) {
e.printStackTrace();
}
return ret;
}
After doing that you can use this little helper method do advise your criteria object to change the FetchMode to SELECT on that query.
Criteria criteria = …
// … add your expression here …
// set fetchmode for every Collection Property to SELECT
for (String property : ReflectUtil.resolveCollectionProperties(YourEntity.class)) {
criteria.setFetchMode(property, org.hibernate.FetchMode.SELECT);
}
criteria.setFirstResult(firstResult);
criteria.setMaxResults(maxResults);
criteria.list();
Doing that is different from define the FetchMode of your entities at design time. So you can use the normal join association fetching on paging algorithms in you UI, because this is most of the time not the critical part and it is more important to have your results as quick as possible.
Below is the way we can do Multiple projection to perform Distinct
package org.hibernate.criterion;
import org.hibernate.Criteria;
import org.hibernate.Hibernate;
import org.hibernate.HibernateException;
import org.hibernate.type.Type;
/**
* A count for style : count (distinct (a || b || c))
*/
public class MultipleCountProjection extends AggregateProjection {
private boolean distinct;
protected MultipleCountProjection(String prop) {
super("count", prop);
}
public String toString() {
if(distinct) {
return "distinct " + super.toString();
} else {
return super.toString();
}
}
public Type[] getTypes(Criteria criteria, CriteriaQuery criteriaQuery)
throws HibernateException {
return new Type[] { Hibernate.INTEGER };
}
public String toSqlString(Criteria criteria, int position, CriteriaQuery criteriaQuery)
throws HibernateException {
StringBuffer buf = new StringBuffer();
buf.append("count(");
if (distinct) buf.append("distinct ");
String[] properties = propertyName.split(";");
for (int i = 0; i < properties.length; i++) {
buf.append( criteriaQuery.getColumn(criteria, properties[i]) );
if(i != properties.length - 1)
buf.append(" || ");
}
buf.append(") as y");
buf.append(position);
buf.append('_');
return buf.toString();
}
public MultipleCountProjection setDistinct() {
distinct = true;
return this;
}
}
ExtraProjections.java
package org.hibernate.criterion;
public final class ExtraProjections
{
public static MultipleCountProjection countMultipleDistinct(String propertyNames) {
return new MultipleCountProjection(propertyNames).setDistinct();
}
}
Sample Usage:
String propertyNames = "titleName;titleDescr;titleVersion"
criteria countCriteria = ....
countCriteria.setProjection(ExtraProjections.countMultipleDistinct(propertyNames);
Referenced from https://forum.hibernate.org/viewtopic.php?t=964506
NullPointerException in some cases!
Without criteria.setProjection(Projections.distinct(Projections.property("id")))
all query goes well!
This solution is bad!
Another way is use SQLQuery. In my case following code works fine:
List result = getSession().createSQLQuery(
"SELECT distinct u.id as usrId, b.currentBillingAccountType as oldUser_type,"
+ " r.accountTypeWhenRegister as newUser_type, count(r.accountTypeWhenRegister) as numOfRegUsers"
+ " FROM recommendations r, users u, billing_accounts b WHERE "
+ " r.user_fk = u.id and"
+ " b.user_fk = u.id and"
+ " r.activated = true and"
+ " r.audit_CD > :monthAgo and"
+ " r.bonusExceeded is null and"
+ " group by u.id, r.accountTypeWhenRegister")
.addScalar("usrId", Hibernate.LONG)
.addScalar("oldUser_type", Hibernate.INTEGER)
.addScalar("newUser_type", Hibernate.INTEGER)
.addScalar("numOfRegUsers", Hibernate.BIG_INTEGER)
.setParameter("monthAgo", monthAgo)
.setMaxResults(20)
.list();
Distinction is done in data base! In opposite to:
criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
where distinction is done in memory, after load entities!

Categories

Resources