How to enable `relevance-trace` using MarkLogic Java API? - java

I'm implementing quite a complex search using MarkLogic Java API. I would like to enable relevance-trace (Relavance trace) to see how my results are scored. Unfortunately, I don't know how to enable it in Java API. I have tried something like:
DatabaseClient client = initClient();
var qmo = client.newServerConfigManager().newQueryOptionsManager();
var searchOptions = "<search:options xmlns=\"http://marklogic.com/appservices/search\">\n"
+ " <search-option>relevance-trace</search-option>\n"
+ " </search:options>";
qmo.writeOptions("searchOptions", new StringHandle(searchOptions).withFormat(Format.XML));
QueryManager qm = client.newQueryManager();
StructuredQueryBuilder qb = qm.newStructuredQueryBuilder("searchOptions");
// query definition
qm.search(query, new SearchHandle())
Unfortunately it ends up with following error:
"Local message: /config/query write failed: Internal Server Error. Server Message: XDMP-DOCNONSBIND:
xdmp:get-request-body(\"xml\") -- No namespace binding for prefix search at line 1 . See the
MarkLogic server error log for further detail."
My question is how to use search options in MarkLogic API, especially I'm interested in relevance-trace and simple-score
Update 1
As suggested by #Jamess Kerr I have change my options to
var searchOptions = "<options xmlns=\"http://marklogic.com/appservices/search\">\n"
+ " <search-option>relevance-trace</search-option>\n"
+ " </options>";
but unfortunately, it still doesn't work. After that change I get error:
Local message: /config/query write failed: Internal Server Error. Server Message: XDMP-UPDATEFUNCTIONFROMQUERY: xdmp:apply(function() as item()*) -- Cannot apply an update function from a query . See the MarkLogic server error log for further detail.

Your search options XML uses the search: namespace prefix but you don't define that prefix. Since you are setting the default namespace, just drop the search: prefix from the search:options open and close tags.

The original Java Query contains both syntactical and semantic issues:
First of all, it is an invalid MarkLogic XQuery in the sense that it has only query option(s) portion. Bypassing the namespace binding prefix is another wrong end of the stick.
To tweak your original query, please replace a search text in between the search:qtext tag ( the pink line ) and run the query.
Result:
Matched and Listing 2 documents:
Matched 1 locations in /medals/coin_1333113127296.xml with 94720 score:
73. …pulsating maple leaf coin another world-first, the [Royal Canadian Mint]is proud to launch a numismatic breakthrough from its ambitious and creative R&D team...
Matched 1 locations in /medals/coin_1333078361643.xml with 94720 score:
71. ...the [Royal Canadian Mint]and Royal Australian Mint have put an end to the dispute relating to the circulation coin colouring process...
Without a semantic criterion, to put it into context, your original query will be an equivalent of removing the search:qtext and performing a fuzzy search.
Note:
If you use serialised term search or search constraints instead of text search, you should get higher score results.
MarkLogic Java API operates in unfiltered mode by default, while the cts:search operates in filtered mode by default. Just be mindful of how you construct the query and the expected score in Java API.
And it is really intended for bulk data write/extract/transformation. qconsole is, in my opinion, more befitting to tune specific query and gather search score, relevance and computation details.

Related

Script fields in hibernate elasticsearch

I'm using hibernate-search-elasticsearch 5.8.2.Final and I can't figure out how to get script fields:
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-script-fields.html
Is there any way to accomplish this functionality?
This is not possible in Hibernate Search 5.8.
In Hibernate Search 5.10 you could get direct access to the REST client, send a REST request to Elasticsearch and get the result as a JSON string that you would have to parse yourself, but it is very low-level and you would not benefit from the Hibernate Search search APIs at all (no query DSL, no managed entity loading, no direct translation entity type => index name, ...).
If you want better support for this feature, don't hesitate to open a ticket on our JIRA, describing in details what you are trying to achieve and how you would have expected to be able to do that. We are currently working on Search 6.0 which brings a lot of improvements, in particular when it comes to using native features of Elasticsearch, so it just might be something we could slip into our backlog.
EDIT: I forgot to mention that, while you cannot use server-side scripts, you can still get the full source from your documents, and do some parsing in your application to achieve a similar result. This will work even in Search 5.8:
FullTextEntityManager fullTextEm = Search.getFullTextEntityManager(entityManager);
FullTextQuery query = fullTextEm.createFullTextQuery(
qb.keyword()
.onField( "tags" )
.matching( "round-based" )
.createQuery(),
VideoGame.class
)
.setProjection( ElasticsearchProjectionConstants.SCORE, ElasticsearchProjectionConstants.SOURCE );
Object[] projections = (Object[]) query.getSingleResult();
for (Object projection : projections) {
float score = (float) projection[0];
String source = (String) projection[1];
}
See this section of the documentation.

Neo4J REST API Call Error - Error reading as JSON ''

this is a newo4j rest api call related error - from my java code I'm making a REST API call to a remote Neo4J Database by passing query and parameters, the query being executed is as below
*MERGE (s:Sequence {name:'CommentSequence'}) ON CREATE SET s.current = 1 ON MATCH SET s.current=s.current+1 WITH s.current as sequenceCounter MERGE (cmnt01:Comment {text: {text}, datetime:{datetime}, type:{type}}) SET cmnt01.id = sequenceCounter WITH cmnt01 MATCH (g:Game {game_id:{gameid}}),(b:Block {block_id:{bid}, game_id:{gameid}}),(u:User {email_id:{emailid}}) MERGE (b)-[:COMMENT]->(cmnt01)<-[:COMMENT]-(u)*
Basically this query is generating a sequence number at run time and sets the 'CommentId' property of the Comment Node as this Sequence number before attaching the comment node to a Game's block i.e. For every comment added by the user I'm adding a sequence number as it's id.
This is working for almost 90% of the cases but there are couple of cases in a day when it fails with below error
ERROR com.exectestret.dao.BaseGraphDAO - Query execution error:**Error reading as JSON ''**
Why does the Neo4J Query not return any proper error code ? It just says error reading as JSON ''.
Neo4J Version is
Neo4j Community Edition 2.2.1
Thanks,
Deepesh
It gets HTML back and can't read it as JSON, but should output the failure HTML can you check the log output for that and share it too?
Also check your graph.db/messages.log and data/log/console.log for any error messages.

Save the result set of a query against a view to a table using the java client library

Recently, one of our clients reported not being able to create a table based on a query against a view. That said, they were able to save the result of a query against a table into another table. This issue spawned a more implementation focused question using the Java client libraries. Specifically, is there any way to save the result set of a query against a view to a table using the Java client library? I will be digging and post anything that I find. That said, any early guidance would be appreciated!
To be specific and add more context, I note that the the following process failed when the query was run against a union view.
java -jar BigQueryToCloudExporter.jar ./GAFastAccessKey.p12 '' "
Select date(date_add('2014-08-09',floor(datediff(date(sec_to_timestamp(visitstarttime)),'2014-08-03')/7)*7,"DAY")) WeekEndDate
, hits.eventinfo.eventaction GA_RentalNo
, count(distinct visitID) PDP_PPC
FROM (TABLE_DATE_RANGE([Union_View.GA],
TIMESTAMP('2014-08-30'),
TIMESTAMP('2014-09-13')))
where hits.eventinfo.eventcategory='property attributes'
and brandId=121
--hits.eventinfo.eventcategory='property inquiry'
and trafficsource.medium like '%cpc%'
--and trafficsource.campaign not like '%ppb%'
and trafficsource.campaign like '%mpm%'
group each by WeekEndDate, GA_XXXXXX
order by WeekEndDate, GA_XXXXXX limit 100" StagingQueryTable QueryTable AVRO gs://XXXXXX/QueryTable*.avro
On the other hand, the following process succeeded when the query was made against a BigQuery table (keeping everything else same).
java -jar BigQueryToCloudExporter.jar ./GAFastAccessKey.p12 '' "
Select date(date_add('2014-08-09',floor(datediff(date(sec_to_timestamp(visitstarttime)),'2014-08-03')/7)*7,"DAY")) WeekEndDate
, hits.eventinfo.eventaction GA_XXXXXX
, count(distinct visitID) PDP_PPC
FROM (TABLE_DATE_RANGE([XXXXXX.ga_sessions_],
TIMESTAMP('2014-08-30'),
TIMESTAMP('2014-09-13')))
where hits.eventinfo.eventcategory='property attributes'
and brandId=121
--hits.eventinfo.eventcategory='property inquiry'
and trafficsource.medium like '%cpc%'
--and trafficsource.campaign not like '%ppb%'
and trafficsource.campaign like '%mpm%'
group each by WeekEndDate, GA_RentalNo
order by WeekEndDate, GA_XXXXXX limit 100" StagingQueryTable QueryTable AVRO gs://XXXXXX/QueryTable*.avro

ElasticSearch - Using FilterBuilders

I am new to ElasticSearch and Couchbase. I am building a sample Java application to learn more about ElasticSearch and Couchbase.
Reading the ElasticSearch Java API, Filters are better used in cases where sort on score is not necessary and for caching.
I still haven't figured out how to use FilterBuilders and have following questions:
Can FilterBuilders be used alone to search?
Or Do they always have to be used with a Query? ( If true, can someone please list an example? )
Going through a documentation, if I want to perform a search based on field values and want to use FilterBuilders, how can I accomplish that? (using AndFilterBuilder or TermFilterBuilder or InFilterBuilder? I am not clear about the differences between them.)
For the 3rd question, I actually tested it with search using queries and using filters as shown below.
I got empty result (no rows) when I tried search using FilterBuilders. I am not sure what am I doing wrong.
Any examples will be helpful. I have had a tough time going through documentation which I found sparse and even searching led to various unreliable user forums.
private void processQuery() {
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
QueryBuilder qb = QueryBuilders.fieldQuery("doc.address.state", "TX");
srb.setQuery(qb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
private void searchWithFilters(){
SearchRequestBuilder srb = getSearchRequestBuilder(BUCKET);
srb.setFilter(FilterBuilders.termFilter("doc.address.state", "tx"));
//AndFilterBuilder andFb = FilterBuilders.andFilter();
//andFb.add(FilterBuilders.termFilter("doc.address.state", "TX"));
//srb.setFilter(andFb);
SearchResponse resp = srb.execute().actionGet();
System.out.println("response :" + resp);
}
--UPDATE--
As suggested in the answer, changing to lowercase "tx" works. With this question resolved. I still have following questions:
In what scenario(s), are filters used with query? What purpose will this serve?
Difference between InFilter, TermFilter and MatchAllFilter. Any illustration will help.
Right, you should use filters to exclude documents from being even considered when executing the query. Filters are faster since they don't involve any scoring, and cacheable as well.
That said, it's pretty obvious that you have to use a filter with the search api, which does execute a query and accepts an optional filter. If you only have a filter you can just use the match_all query together with your filter. A filter can be a simple one, or a compund one in order to combine multiple filters together.
Regarding the Java API, the names used are the names of the filters available, no big difference. Have a look at this search example for instance. In your code I don't see where you do setFilter on your SearchRequestBuilder object. You don't seem to need the and filter either, since you are using a single filter. Furthermore, it might be that you are indexing using the default mappings, thus the term "TX" is lowercased. That's why when you search using the term filter you don't find any match. Try searching for "tx" lowercased.
You can either change your mapping if you want to keep the "TX" term as it is while indexing, probably setting the field as not_analyzed if it should only be a single token. Otherwise you can change filter, you might want to have a look at a query that is analyzed, so that your query wil be analyzed the same way the content was indexed.
Have a look at the query DSL documentation for more information regarding queries and filters:
MatchAllFilter: matches all your document, not that useful I'd say
TermFilter: Filters documents that have fields that contain a term (not analyzed)
AndFilter: compound filter used to put in and two or more filters
Don't know what you mean by InFilterBuilder, couldn't find any filter with this name.
The query usually contains what the user types in through the text search box. Filters are more way to refine the search, for example clicking on facet entries. That's why you would still have the query plus one or more filters.
To append to what #javanna said:
A lot of confusion can come from the fact that filters can be defined in several ways:
standalone (with a required query, for instance match_all if all you need is the filters) (http://www.elasticsearch.org/guide/reference/api/search/filter/)
or as part of a filtered query (http://www.elasticsearch.org/guide/reference/query-dsl/filtered-query/)
What's the difference you might ask. And indeed you can construct exactly the same logic in both ways.
The difference is that a query operates on BOTH the resultset as well as any facets you have defined. Whereas, a Filter (when defined standalone) only operates on the resultset and NOT on any facets you may have defined (explained here: http://www.elasticsearch.org/guide/reference/api/search/filter/)
To add to the other answers, InFilter is only used with FilterBuilders. The definition is, InFilter: A filter for a field based on several terms matching on any of them.
The query Java API uses FilterBuilders, which is a factory for filter builders that can dynamically create a query from Java code. We do this using a form and we build our query based on user selections from it with checkboxes, options, and dropdowns.
Here is some Example code for FilterBuilders and there is a snippet from that link that uses InFilter as shown below:
FilterBuilder filterBuilder;
User user = (User) auth.getPrincipal();
if (user.getGroups() != null && !user.getGroups().isEmpty()) {
filterBuilder = FilterBuilders.boolFilter()
.should(FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName())))
.should(FilterBuilders.nestedFilter("groupRoles", FilterBuilders.inFilter("groupRoles.key", user.getGroups().toArray())));
} else {
filterBuilder = FilterBuilders.nestedFilter("userRoles", FilterBuilders.termFilter("userRoles.key", auth.getName()));
}
...

Iterating over every document in Lotus Domino

I'd like iterate over every document in a (probably big) Lotus Domino database and be able to continue it from the last one if the processing breaks (network connection error, application restart etc.). I don't have write access to the database.
I'm looking for a way where I don't have to download those documents from the server which were already processed. So, I have to pass some starting information to the server which document should be the first in the (possibly restarted) processing.
I've checked the AllDocuments property and the DocumentColletion.getNthDocument method but this property is unsorted so I guess the order can change between two calls.
Another idea was using a formula query but it does not seem that ordering is possible with these queries.
The third idea was the Database.getModifiedDocuments method with a corresponding Document.getLastModified one. It seemed good but
it looks to me that the ordering of the returned collection is not documented and based on creation time instead of last modification time.
Here is a sample code based on the official example:
System.out.println("startDate: " + startDate);
final DocumentCollection documentCollection =
database.getModifiedDocuments(startDate, Database.DBMOD_DOC_DATA);
Document doc = documentCollection.getFirstDocument();
while (doc != null) {
System.out.println("#lastmod: " + doc.getLastModified() +
" #created: " + doc.getCreated());
doc = documentCollection.getNextDocument(doc);
}
It prints the following:
startDate: 2012.07.03 08:51:11 CEDT
#lastmod: 2012.07.03 08:51:11 CEDT #created: 2012.02.23 10:35:31 CET
#lastmod: 2012.08.03 12:20:33 CEDT #created: 2012.06.01 16:26:35 CEDT
#lastmod: 2012.07.03 09:20:53 CEDT #created: 2012.07.03 09:20:03 CEDT
#lastmod: 2012.07.21 23:17:35 CEDT #created: 2012.07.03 09:24:44 CEDT
#lastmod: 2012.07.03 10:10:53 CEDT #created: 2012.07.03 10:10:41 CEDT
#lastmod: 2012.07.23 16:26:22 CEDT #created: 2012.07.23 16:26:22 CEDT
(I don't use any AgentContext here to access the database. The database object comes from a session.getDatabase(null, databaseName) call.)
Is there any way to reliably do this with the Lotus Domino Java API?
If you have access to change the database, or could ask someone to do so, then you should create a view that is sorted on a unique key, or modified date, and then just store the "pointer" to the last document processed.
Barring that, you'll have to maintain a list of previously processed documents yourself. In that case you can use the AllDocuments property and just iterate through them. Use the GetFirstDocument and GetNextDocument as they are reportedly faster than GetNthDocument.
Alternatively you could make two passes, one to gather a list of UNIDs for all documents, which you'll store, and then make a second pass to process each document from the list of UNIDs you have (using GetDocumentByUNID method).
I don't use the Java API, but in Lotusscript, I would do something like this:
Locate a view displaying all documents in the database. If you want the agent to be really fast, create a new view. The first column should be sorted and could contain the Universal ID of the document. The other columns contains all the values you want to read in your agent, in your example that would be the created date and last modified date.
Your code could then simply loop through the view like this:
lastSuccessful = FunctionToReadValuesSomewhere() ' Returns 0 if empty
Set view = thisdb.GetView("MyLookupView")
Set col = view.AllEntries
Set entry = col.GetFirstEntry
cnt = 0
Do Until entry is Nothing
cnt = cnt + 1
If cnt > lastSuccessful Then
universalID = entry.ColumnValues(0)
createDate = entry.ColumnValues(1)
lastmodifiedDate = entry.ColumnValues(2)
Call YourFunctionToDoStuff(universalID, createDate, lastmodifiedDate)
Call FunctionToStoreValuesSomeWhere(cnt, universalID)
End If
Set entry = col.GetFirstEntry
Loop
Call FunctionToClearValuesSomeWhere()
Simply store the last successful value and Universal ID in say a text file or environment variable or even profile document in the database.
When you restart the agent, have some code that check if the values are blank (then return 0), otherwise return the last successful value.
Agents already keep a field to describe documents that they have not yet processed, and these are automatically updated via normal processing.
A better way of doing what you're attempting to do might be to store the results of a search in a profile document. However, if you're trying to relate to documents in a database you do not have write permission to, the only thing you can do is keep a list of the doclinks you've already processed (and any information you need to keep about those documents), or a sister database holding one document for each doclink plus multiple fields related to the processing you've done on them. Then, transfer the lists of IDs and perform the matching on the client to do per-document lookups.
Lotus Notes/Domino databases are designed to be distributed across clients and servers in a replicated environment. In the general case, you do not have a guarantee that starting at a given creation or mod time will bring you consistent results.
If you are 100% certain that no replicas of your target database are ever made, then you can use getModifiedDocuments and then write a sort routine to place (modDateTime,UNID) pairs into a SortedSet or other suitable data structure. Then you can process through the Set, and if you run into an error you can save the modDateTime of the element that you were attempting to process as your restart point. There may be a few additional details for you to work out to avoid duplicates, however, if there are multiple documents with the exact same modDateTime stamp.
I want to make one final remark. I understand that you are asking about Java, but if you are working on a backup or archiving system for compliance purposes, the Lotus C API has special functions that you really should look at.

Categories

Resources