I am trying to implement search suggestions for my app. Actually I need a kind of "multi-term prefix query" and I was trying to use a PrefixCompletionQuery. The problem is that an IllegalArgumentException is thrown when "search" or "suggest" methods are called from a SuggestIndexSearcher object.
I wrote a sample code to reproduce the problem:
public static void main(String[] args) throws IOException {
RAMDirectory dir = new RAMDirectory(); //just for this experiment
Analyzer analyzer = new CompletionAnalyzer(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(analyzer));
var doc = new Document();
doc.add(new SuggestField("suggest", "Hi everybody!",4));
writer.addDocument(doc);
doc = new Document();
doc.add(new SuggestField("suggest", "nice to meet you",4));
writer.addDocument(doc);
writer.commit(); // maybe redundant
writer.close();
var reader = DirectoryReader.open(dir);
var searcher = new SuggestIndexSearcher(reader);
var query = new PrefixCompletionQuery(analyzer, new Term("suggest", "everyb"));
TopDocs results = searcher.search(query, 5);
for (var res : results.scoreDocs) {
System.out.println(reader.document(res.doc).get("id"));
}
}
And this is what i get:
Exception in thread "main" java.lang.IllegalArgumentException: suggest is not a SuggestField
at org.apache.lucene.search.suggest.document.CompletionWeight.bulkScorer(CompletionWeight.java:86)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
at experiments.main.main(main.java:67) #TopDocs results = searcher.search(query, 5);
Trying to be as complete as possible, the project depends on lucene-core 8.8.2 and lucene-suggest 8.8.2 .
Where am I wrong?
I think you have to change the posting format of your suggestion field by adding a custom codec to your index writer.
For example something like this:
RAMDirectory dir = new RAMDirectory();
Analyzer analyzer = new CompletionAnalyzer(new StandardAnalyzer());
IndexWriterConfig config = new IndexWriterConfig(analyzer);
Codec codec = new Lucene87Codec() {
#Override
public PostingsFormat getPostingsFormatForField(String field) {
if (field.equals("suggest")) {
return new Completion84PostingsFormat();
}
return super.getPostingsFormatForField(field);
}
};
config.setCodec(codec);
IndexWriter indexWriter = new IndexWriter(dir, config);
Related
This question already has an answer here:
can't delete document with lucene IndexWriter.deleteDocuments(term)
(1 answer)
Closed 6 years ago.
I build a search index for luence like this:
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Field tagField = new Field("tag", joinListStr(gifModel.getTags()), Field.Store.YES, Field.Index.ANALYZED);
Field textField = new Field("text", gifModel.getText(), Field.Store.NO, Field.Index.ANALYZED);
doc.add(idField);
doc.add(tagField);
doc.add(textField);
iwriter.addDocument(doc);
I want to delete that document by Term via the _id field acroding to this article:
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexReader indexReader = IndexReader.open(directory);
Term term = new Term("_id", id);
int num = indexReader.deleteDocuments(term);
indexReader.close();
return new ReturnMap(num);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}
But here num allways is 0 and search result shows the document still in the search index, what have I missing?
EDIT
change the indexReader to indexWriter still not working
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_CURRENT, new SmartChineseAnalyzer(Version.LUCENE_CURRENT));
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
indexWriter.close();
return new ReturnMap(0);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}
What version of Lucene are you using?? IndexReader.deleteDocuments no longer exists. It was depricated after Lucene 3.6. Either ways use the IndexWriter class.
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new SimpleAnalyzer());
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
IndexWriter.deletedocuments(term)
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Seems you have made the id field unindexable. So it cannot be searched, even if it is stored. You will have to use a field that is searchable from the index.
Currently I am working on the auto suggestion part using lucene in my application. The Auto suggestion of the words are working fine in console application but now i have integerated to the web application but it's not working the desired way.
When the documents are search for the first time with some keywords search and auto suggestion both are working fine and showing the result. But when i search again for some other keyword or same keyword both the auto suggestion as well as Search result are not showing. I am not able to figure out why this weird result is coming.
The snippets for the auto suggestion as well as search are as follows:
final int HITS_PER_PAGE = 20;
final String RICH_DOCUMENT_PATH = "F:\\Sample\\SampleRichDocuments";
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
String searchText = request.getParameter("search_text");
BooleanQuery.Builder booleanQuery = null;
Query textQuery = null;
Query fileNameQuery = null;
try {
textQuery = new QueryParser("content", new StandardAnalyzer()).parse(searchText);
fileNameQuery = new QueryParser("title", new StandardAnalyzer()).parse(searchText);
booleanQuery = new BooleanQuery.Builder();
booleanQuery.add(textQuery, BooleanClause.Occur.SHOULD);
booleanQuery.add(fileNameQuery, BooleanClause.Occur.SHOULD);
} catch (ParseException e) {
e.printStackTrace();
}
Directory index = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(HITS_PER_PAGE);
try{
searcher.search(booleanQuery.build(), collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for (ScoreDoc hit : hits) {
Document doc = reader.document(hit.doc);
}
// Auto Suggestion of the data
Dictionary dictionary = new LuceneDictionary(reader, "content");
AnalyzingInfixSuggester analyzingSuggester = new AnalyzingInfixSuggester(index, new StandardAnalyzer());
analyzingSuggester.build(dictionary);
List<LookupResult> lookupResultList = analyzingSuggester.lookup(searchText, false, 10);
System.out.println("Look up result size :: "+lookupResultList.size());
for (LookupResult lookupResult : lookupResultList) {
System.out.println(lookupResult.key+" --- "+lookupResult.value);
}
analyzingSuggester.close();
reader.close();
}catch(IOException e){
e.printStackTrace();
}
For ex:
In first iteration if i search for word "sample"
Auto suggestion gives me result: sample, samples, sampler etc. (These are the words in the documents)
Search Result as : sample
But if i search it again with same text or different it's showing no result and also LookUpResult list size is coming Zero.
I am not getting why this is happening. Please help
Below is the updated code for the index creation from set of documents.
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
long startTime = System.currentTimeMillis();
List<ContentHandler> contentHandlerList = new ArrayList<ContentHandler> ();
String fileNames = (String)request.getAttribute("message");
File file = new File("F:\\Sample\\SampleRichDocuments"+fileNames);
ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file);
Metadata metadata = new Metadata();
// Parsing the Rich document set with Apache Tikka
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
InputStream stream = new FileInputStream(file);
try {
parser.parse(stream, handler, metadata, context);
contentHandlerList.add(handler);
}catch (TikaException e) {
e.printStackTrace();
}catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
try {
stream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStored(true);
Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(directory, conf);
Iterator<ContentHandler> handlerIterator = contentHandlerList.iterator();
Iterator<File> fileIterator = fileList.iterator();
Date date = new Date();
while (handlerIterator.hasNext() && fileIterator.hasNext()) {
Document doc = new Document();
String text = handlerIterator.next().toString();
String textFileName = fileIterator.next().getName();
String fileName = textFileName.replaceAll("_", " ");
fileName = fileName.replaceAll("-", " ");
fileName = fileName.replaceAll("\\.", " ");
String fileNameArr[] = fileName.split("\\s+");
for(String contentTitle : fileNameArr){
Field titleField = new Field("title",contentTitle,fieldType);
titleField.setBoost(2.0f);
doc.add(titleField);
}
if(fileNameArr.length > 0){
fileName = fileNameArr[0];
}
String document_id= UUID.randomUUID().toString();
FieldType documentFieldType = new FieldType();
documentFieldType.setStored(false);
Field idField = new Field("document_id",document_id, documentFieldType);
Field fileNameField = new Field("file_name", textFileName, fieldType);
Field contentField = new Field("content",text,fieldType);
doc.add(idField);
doc.add(contentField);
doc.add(fileNameField);
writer.addDocument(doc);
analyzer.close();
}
writer.commit();
writer.deleteUnusedFiles();
long endTime = System.currentTimeMillis();
writer.close();
Also i have observed that from second search iteration the files in the index directory are getting deleted and only the file with .segment suffix is getting changes like .segmenta, .segmentb, .segmentc etc..
I dont know why this weird situation is happening.
your code looks pretty straightforward. So, I am sensing that you might facing this problem because something is going wrong with your indexes, providing the information about how you are building indexes might help to diagnose.
But exact code this time :)
I think your problem is with writer.deleteUnusedFiles() call.
According to JavaDocs, this call can "delete unreferenced index commits".
What indexes to delete is driven by IndexDeletionPolicy.
However "The default deletion policy is KeepOnlyLastCommitDeletionPolicy, which always removes old commits as soon as a new commit is done (this matches the behavior before 2.2).".
It also talks about "delete on last close", which means once this index is used and closed(e.g. during search), that index will be deleted.
So all indexes that matched your first search result will be deleted immediately.
Try this:
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
conf.setIndexDeletionPolicy(NoDeletionPolicy.INSTANCE);
I'm creating a lucene index for citynames and countrycodes (depending on each other). I want to countrycodes to be lowercase searchable and exact match.
At first, I now try to query a single countrycode and find all indexed elements that match that code. By my result is always empty.
//prepare
VERSION = Version.LUCENE_4_9;
IndexWriterConfig config = new IndexWriterConfig(VERSION, new SimpleAnalyzer());
//index
Document doc = new Document();
doc.add(new StringField("countryCode", countryCode, Field.Store.YES));
writer.addDocument(doc);
//lookup
Query query = new QueryParser(VERSION, "countryCode", new SimpleAnalyzer()).parse(countryCode);
Result:
when I query for coutrycodes like "IT", "DE", "EN" etc, the result is always empty. Why?
Is SimpleAnalyzer from for 2-letter countrycodes?
For StringField, you can use TermQuery instead of QueryParser
Directory dir = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_9, new SimpleAnalyzer(Version.LUCENE_4_9));
IndexWriter writer = new IndexWriter(dir, config);
String countryCode = "DE";
// index
Document doc = new Document();
doc.add(new StringField("countryCode", countryCode, Store.YES));
writer.addDocument(doc);
writer.close();
IndexSearcher search = new IndexSearcher(DirectoryReader.open(dir));
//lookup
Query query = new TermQuery(new Term("countryCode", countryCode));
TopDocs docs = search.search(query, 1);
System.out.println(docs.totalHits);
I'm a bit confused here. I'll assume the your index writer is initialized in some part of your code not provided, but shy aren't you passing in Version into SimpleAnalyzer? There is no no arg constructor for SimpleAnalyzer, not since 3.X, anyway.
That's the only real issue I see. Here is a working example using your code:
private static Version VERSION;
public static void main(String[] args) throws IOException, ParseException {
//prepare
VERSION = Version.LUCENE_4_9;
Directory dir = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(VERSION, new SimpleAnalyzer(VERSION));
IndexWriter writer = new IndexWriter(dir, config);
String countryCode = "DE";
//index
Document doc = new Document();
doc.add(new TextField("countryCode", countryCode, Field.Store.YES));
writer.addDocument(doc);
writer.close();
IndexSearcher search = new IndexSearcher(DirectoryReader.open(dir));
//lookup
Query query = new QueryParser(VERSION, "countryCode", new SimpleAnalyzer(VERSION)).parse(countryCode);
TopDocs docs = search.search(query, 1);
System.out.println(docs.totalHits);
}
I am working on this piece of code which add a single document to a lucene (4.7) index and then try to find it by quering a term that exists in the document for sure. But indexSearcher doesn't return any document. What is wrong with my code? Thank you for your comments and feedbacks.
String indexDir = "/home/richard/luc_index_03";
try {
Directory directory = new SimpleFSDirectory(new File(
indexDir));
Analyzer analyzer = new SimpleAnalyzer(
Version.LUCENE_47);
IndexWriterConfig conf = new IndexWriterConfig(
Version.LUCENE_47, analyzer);
conf.setOpenMode(OpenMode.CREATE_OR_APPEND);
conf.setRAMBufferSizeMB(256.0);
IndexWriter indexWriter = new IndexWriter(
directory, conf);
Document doc = new Document();
String title="New York is an awesome city to live!";
doc.add(new StringField("title", title, StringField.Store.YES));
indexWriter.addDocument(doc);
indexWriter.commit();
indexWriter.close();
directory.close();
IndexReader reader = DirectoryReader
.open(FSDirectory.open(new File(
indexDir)));
IndexSearcher indexSearcher = new IndexSearcher(
reader);
String field="title";
SimpleQueryParser qParser = new SimpleQueryParser(analyzer, field);
String queryText="New York" ;
Query query = qParser.parse(queryText);
int hitsPerPage = 100;
TopDocs results = indexSearcher.search(query, 5 * hitsPerPage);
System.out.println("number of results: "+results.totalHits);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
for (ScoreDoc scoreDoc:hits){
Document docC = indexSearcher.doc(scoreDoc.doc);
String path = docC.get("path");
String titleC = docC.get("title");
String ne = docC.get("ne");
System.out.println(path+"\n"+titleC+"\n"+ne);
System.out.println("---*****----");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
After running I just get
number of results: 0
This is because you use StringField. From the javadoc:
A field that is indexed but not tokenized: the entire String value is indexed as a single token.
Just use TextField instead and you should be ok.
I tried to implement an indexbased textsearch with lucene 4.3.1. The code is below. I created the index with au NGramTokenyzer because I would like to find searchresults which are too far away for the FuzzyQuery.
I have two problems with my solution. The first is that i don't understand why it finds some things but others not. E.g. if I look for "Buter", "utter" or "Bute" it finds "Butter", but if I look for "Btter" there's no Result. Is there an error in my implementation, what should i make different?
Also I would like that it always gives me (e.g.) 10 results for each query. Is this eaven achievable with my code, or what would I need to change to get this 10 results?
Here's the code:
public LuceneIndex() throws IOException{
File dir = new File(indexDirectoryPath);
index = FSDirectory.open(dir);
analyzer = new NGramAnalyzer();
config = new IndexWriterConfig(luceneVersion, analyzer);
indexWriter = new IndexWriter(index, config);
reader = DirectoryReader.open(FSDirectory.open(dir));
searcher = new IndexSearcher(reader);
queryParser = new QueryParser(luceneVersion, "label", new NGramAnalyzer());
}
/**
* building the index
* #param graph
* #throws IOException
*/
public void makeIndex(MyGraph graph) throws IOException {
FieldType fieldType = new FieldType();
fieldType.setTokenized(true);
//read the items that should be indexed
ArrayList<String> DbList = Helper.readListFromFileDb(indexFilePath);
for (String word : DbList) {
Document doc = new Document();
doc.add(new TextField("label", word, Field.Store.YES));
indexWriter.addDocument(doc);
}
indexWriter.close();
}
public void searchIndexWithQueryParser(String searchString, int numberOfResults) throws IOException, ParseException {
System.out.println("Searching for '" + searchString + "' using QueryParser");
Query query = queryParser.parse(searchString);
System.out.println(query.toString());
TopDocs results = searcher.search(query, numberOfResults);
ScoreDoc[] hits = results.scoreDocs;
//just to see some output...
int i = 0;
Document doc = searcher.doc(hits[i].doc);
String label = doc.get("label");
System.out.println(label);
}
Edit: Code for the NGramAnalyzer
public class NGramAnalyzer extends Analyzer {
int minGram = 2;
int maxGram = 2;
#Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new NGramTokenizer(reader, minGram, maxGram);
CharArraySet charArraySet = StopFilter.makeStopSet(Version.LUCENE_43,
FoodProductBlackList.blackList, true);
TokenStream filter = new StopFilter(Version.LUCENE_43, source, charArraySet);
return new TokenStreamComponents(source, filter);
}
}