We are trying to find the scores assigned by the lucene to the neo4j query results.
IndexManager index = graphDb.index();
Index<Node> fulltextMovies = index.forNodes("Restaurant");
QueryContext query = new QueryContext("name:" + term + "*");
TermQuery t = new TermQuery(new Term("name", term + "m*"));
IndexHits<Node> hits = fulltextMovies.query(t);
System.out.println(hits.currentScore());
The last line of the code always prints 0.0
Do we have to define the custom scores to get this working? As per my understanding, lucene assigns a score to every search result. If so,. I should see a lucene score against my query results. Is this possible?
You can use currentScore() only while consuming the iterator from the index query. As an example:
try (IndexHits<Node> hits = index.query(t)) {
for (Node node : hits) {
System.out.println(node + " " + hits.currentScore());
}
}
Related
I am currently working on a small search engine for college using Lucene 8. I already built it before, but without applying any weights to documents.
I am now required to add the PageRanks of documents as a weight for each document, and I already computed the PageRank values. How can I add a weight to a Document object (not query terms) in Lucene 8? I looked up many solutions online, but they only work for older versions of Lucene. Example source
Here is my (updated) code that generates a Document object from a File object:
public static Document getDocument(File f) throws FileNotFoundException, IOException {
Document d = new Document();
//adding a field
FieldType contentType = new FieldType();
contentType.setStored(true);
contentType.setTokenized(true);
contentType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
contentType.setStoreTermVectors(true);
String fileContents = String.join(" ", Files.readAllLines(f.toPath(), StandardCharsets.UTF_8));
d.add(new Field("content", fileContents, contentType));
//adding other fields, then...
//the boost coefficient (updated):
double coef = 1.0 + ranks.get(path);
d.add(new DoubleDocValuesField("boost", coef));
return d;
}
The issue with my current approach is that I would need a CustomScoreQuery object to search the documents, but this is not available in Lucene 8. Also, I don't want to downgrade now to Lucene 7 after all the code I wrote in Lucene 8.
Edit:
After some (lengthy) research, I added a DoubleDocValuesField to each document holding the boost (see updated code above), and used a FunctionScoreQuery for searching as advised by #EricLavault. However, now all my documents have a score of exactly their boost, regardless of the query! How do I fix that? Here is my searching function:
public static TopDocs search(String query, IndexSearcher searcher, String outputFile) {
try {
Query q_temp = buildQuery(query); //the original query, was working fine alone
Query q = new FunctionScoreQuery(q_temp, DoubleValuesSource.fromDoubleField("boost")); //the new query
q = q.rewrite(DirectoryReader.open(bm25IndexDir));
TopDocs results = searcher.search(q, 10);
ScoreDoc[] filterScoreDosArray = results.scoreDocs;
for (int i = 0; i < filterScoreDosArray.length; ++i) {
int docId = filterScoreDosArray[i].doc;
Document d = searcher.doc(docId);
//here, when printing, I see that the document's score is the same as its "boost" value. WHY??
System.out.println((i + 1) + ". " + d.get("path")+" Score: "+ filterScoreDosArray[i].score);
}
return results;
}
catch(Exception e) {
e.printStackTrace();
return null;
}
}
//function that builds the query, working fine
public static Query buildQuery(String query) {
try {
PhraseQuery.Builder builder = new PhraseQuery.Builder();
TokenStream tokenStream = new EnglishAnalyzer().tokenStream("content", query);
tokenStream.reset();
while (tokenStream.incrementToken()) {
CharTermAttribute charTermAttribute = tokenStream.getAttribute(CharTermAttribute.class);
builder.add(new Term("content", charTermAttribute.toString()));
}
tokenStream.end(); tokenStream.close();
builder.setSlop(1000);
PhraseQuery q = builder.build();
return q;
}
catch(Exception e) {
e.printStackTrace();
return null;
}
}
Starting from Lucene 6.5.0 :
Index-time boosts are deprecated. As a replacement,
index-time scoring factors should be indexed into a doc value field
and combined at query time using eg. FunctionScoreQuery. (Adrien
Grand)
The recommendation instead of using index time boost would be to encode scoring factors (ie. length normalization factors) into doc values fields instead. (cf. LUCENE-6819)
Regarding my edited problem (boost value completely replacing search score instead of boosting it), here is what the documentation says about FunctionScoreQuery (emphasis mine):
A query that wraps another query, and uses a DoubleValuesSource to replace or modify the wrapped query's score.
So, when does it replace, and when does it modify?
Turns out, the code I was using is for entirely replacing the score by the boost value:
Query q = new FunctionScoreQuery(q_temp, DoubleValuesSource.fromDoubleField("boost")); //the new query
What I needed to do instead was using the function boostByValue, that modifies the searching score (by multiplying the score by the boost value):
Query q = FunctionScoreQuery.boostByValue(q_temp, DoubleValuesSource.fromDoubleField("boost"));
And now it works! Thanks #EricLavault for the help!
I'm trying to do a query to fin all possible paths that correspond to the pattern "(Order) - [ORDERS] -> (Product) - [PART_OF] -> (Category)" and would like to get the whole path (i.e. all 3 nodes and 2 relationships as their appropriate classes).
The method i used below only let me have 1 column of data (number of orders: 2155). If I tried it once more (the 2nd for loop), the number of row i'd get is 0(number of products: 0). Is there a way to save all the results as nodes and relationships or do I have to query the command 5 times over?
Please help!
String query = "MATCH (o:Order)-[:ORDERS]->(p:Product)-[:PART_OF]->(cate:Category) return o,p,cate";
try( Transaction tx = db.beginTx();
Result result = db.execute(query) ){
Iterator<Node> o_column = result.columnAs( "o" );
int i = 0;
for ( Node node : Iterators.asIterable( o_column ) )
{
i++;
}
System.out.println("number of orders: " + i);
i = 0;
Iterator<Node> p_column = result.columnAs( "p" );
for ( Node node : Iterators.asIterable( p_column ) )
{
i++;
}
System.out.println("number of products: " + i);
tx.success();
}
I've found a way to work around this in the code below, where i'd changes the return value to the node ID using id() then uses GraphDatabaseService.getNodeByID(long):
String query = "MATCH (o:Order)-[:ORDERS]->(p:Product)-[:PART_OF]->(cate:Category) return id(o), id(p), id(cate)";
int nodeID = Integer.parseInt(column.getValue().toString());
Node node = db.getNodeById(nodeID);
If you do this :
MATCH path=(o:Order)-[:ORDERS]->(p:Product)-[:PART_OF]->(cate:Category) return path
You can process path in your loop and unpack that. Takes a bit of exploring but all the information is in there.
Hope that helps.
Regards,
Tom
I have a M-to-M relation going from Nomination to User mapped on a "Nominee" table. I have the following method to encapsulate results in a paging class called "ResultPage":
protected ResultPage<T> findPageByCriteria(Criteria criteria, int page,
int pageSize) {
DataVerify.notNull(criteria);
DataVerify.greaterThan(page, 0, "Invalid page number");
DataVerify.isTrue(pageSize >= 0, "Invalid page size");
if (logger.isDebugEnabled()) {
logger.debug("Arguments: ");
logger.debug("Page: " + page);
logger.debug("Page size: " + pageSize);
}
int totalItems = 0;
List<T> results = null;
if (pageSize != 0) {
totalItems = ((Number) criteria.setProjection(Projections.rowCount()).
uniqueResult()).intValue();
criteria.setProjection(null);
criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);
criteria.addOrder(Order.desc("id"));
results = criteria.setFirstResult((page-1) * pageSize).
setMaxResults(pageSize).list();
} else {
results = criteria.setFirstResult((page-1) * pageSize).
list();
totalItems = results.size();
}
ResultPage<T> resultsPage = new ResultPage<T>(results, page,
totalItems,
(pageSize != 0) ? pageSize :
totalItems);
if (logger.isDebugEnabled()){
logger.debug("Total Results: " + resultsPage.getTotalItems());
}
return resultsPage;
}
Now fetching is done right. However my results count is not being consistent. This of course only happens when a "Nomination" has more than 1 user assigned to it. It then counts the users instead of the root entity and thus I get totals of "1 to 22" per page instead of "1 to 25" like I have specified - as if there are 22 nominations but 25 users total.
Can I get some help for this? Let me know if I have to clarify.
if anything this is the question that comes as closest as my problem: how to retrieve distinct root entity row count in hibernate?
The solution I use for this problem is to have a first query to only load the IDs of the root entities that satisfy the criteria (i.e. the IDs of your 25 nominations), and then issue a second query which loads the data of these 25 IDs, by doing a query like the following
select n from Nomination n
[... joins and fetches]
where n.id in (:ids)
Having a document already indexed, at search i must part that document in two: first part consist of the first 100 words (tokens) and the rest of the document represents the second part. I have to score this two parts like this: the second part with 70% and the first with 30%.
EDIT 2: So i tried creating a Searcher that uses SpanPositionRangeQuery, but i must have understood SpanQuery usage all wrong because i can't get any hits (i used lukeall to verify if the words i was searching were indexed). Can someone give me a hand?
public static void search(String indexDir, String q) throws Exception
{
Directory dir = FSDirectory.open(new File(indexDir), null);
IndexSearcher is = new IndexSearcher(dir);
Term term = new Term("Field", q);
SpanPositionRangeQuery spanQuery = new SpanPositionRangeQuery(new SpanTermQuery(term), 0, 100);
spanQuery.setBoost(0.3f);CustomRomanianAnalyzer(Version.LUCENE_35));
long start = System.currentTimeMillis();
TopDocs hits = is.search(spanQuery, 10);
//TopDocs hits = is.search(query, 10);
long end = System.currentTimeMillis();
System.err.println("I found " + hits.totalHits + " documents (in " +
(end - start) + " milliseconds) '" +
q + "':");
for (int i=0;i<hits.scoreDocs.length;i++)
{
ScoreDoc scoreDoc = hits.scoreDocs[i];
Document doc = is.doc(scoreDoc.doc);
System.out.println(doc.get("filename"));
}
is.close();
}
I don't know how to combine query parser with SpanPositionRangeQuery to get what i need...
Yes, this can be done by setting the boost for each clause in a BooleanQuery. Using separate fields will work, but isn't strictly necessary. Lucene has a SpanPositionRangeQuery suitable for searching part of a document.
<SpanPositionRangeQuery: spanPosRange(field:term, 0, 100)^0.3>
How to extract term frequency of each word from a Lucene 5.2.1 index using java?
I have code that used to work for a previous Luecene version does not work anymore. I think most code on the Internet are for previous versions of Lucene.
You can get the term frequency of a given term from IndexReader.totalTermFreq, such as:
Term myTerm = new Term("contentfield", "myterm");
long totaltf = myReader.totalTermFreq(myTerm);
If you want to interate all the terms in the index and get the frequency of each, you can use MultiFields for that:
Fields fields = MultiFields.getFields(reader);
Iterator<String> fieldsIter = fields.iterator();
while (fieldsIter.hasNext()) {
String fieldname = fieldsIter.next();
TermsEnum terms = fields.terms(fieldname).iterator();
BytesRef term;
while ((term = terms.next()) != null) {
System.out.println(fieldname + ":" + term.utf8ToString() + " ttf:" + terms.totalTermFreq());
//Or whatever else you want to do with it...
}
}