can't delete document index in luence [duplicate] - java

This question already has an answer here:
can't delete document with lucene IndexWriter.deleteDocuments(term)
(1 answer)
Closed 6 years ago.
I build a search index for luence like this:
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Field tagField = new Field("tag", joinListStr(gifModel.getTags()), Field.Store.YES, Field.Index.ANALYZED);
Field textField = new Field("text", gifModel.getText(), Field.Store.NO, Field.Index.ANALYZED);
doc.add(idField);
doc.add(tagField);
doc.add(textField);
iwriter.addDocument(doc);
I want to delete that document by Term via the _id field acroding to this article:
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexReader indexReader = IndexReader.open(directory);
Term term = new Term("_id", id);
int num = indexReader.deleteDocuments(term);
indexReader.close();
return new ReturnMap(num);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}
But here num allways is 0 and search result shows the document still in the search index, what have I missing?
EDIT
change the indexReader to indexWriter still not working
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_CURRENT, new SmartChineseAnalyzer(Version.LUCENE_CURRENT));
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
indexWriter.close();
return new ReturnMap(0);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}

What version of Lucene are you using?? IndexReader.deleteDocuments no longer exists. It was depricated after Lucene 3.6. Either ways use the IndexWriter class.
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new SimpleAnalyzer());
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
IndexWriter.deletedocuments(term)
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Seems you have made the id field unindexable. So it cannot be searched, even if it is stored. You will have to use a field that is searchable from the index.

Related

Lucene suggest: "is not a SuggestField" exception when using a CompletionQuery

I am trying to implement search suggestions for my app. Actually I need a kind of "multi-term prefix query" and I was trying to use a PrefixCompletionQuery. The problem is that an IllegalArgumentException is thrown when "search" or "suggest" methods are called from a SuggestIndexSearcher object.
I wrote a sample code to reproduce the problem:
public static void main(String[] args) throws IOException {
RAMDirectory dir = new RAMDirectory(); //just for this experiment
Analyzer analyzer = new CompletionAnalyzer(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(analyzer));
var doc = new Document();
doc.add(new SuggestField("suggest", "Hi everybody!",4));
writer.addDocument(doc);
doc = new Document();
doc.add(new SuggestField("suggest", "nice to meet you",4));
writer.addDocument(doc);
writer.commit(); // maybe redundant
writer.close();
var reader = DirectoryReader.open(dir);
var searcher = new SuggestIndexSearcher(reader);
var query = new PrefixCompletionQuery(analyzer, new Term("suggest", "everyb"));
TopDocs results = searcher.search(query, 5);
for (var res : results.scoreDocs) {
System.out.println(reader.document(res.doc).get("id"));
}
}
And this is what i get:
Exception in thread "main" java.lang.IllegalArgumentException: suggest is not a SuggestField
at org.apache.lucene.search.suggest.document.CompletionWeight.bulkScorer(CompletionWeight.java:86)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
at experiments.main.main(main.java:67) #TopDocs results = searcher.search(query, 5);
Trying to be as complete as possible, the project depends on lucene-core 8.8.2 and lucene-suggest 8.8.2 .
Where am I wrong?
I think you have to change the posting format of your suggestion field by adding a custom codec to your index writer.
For example something like this:
RAMDirectory dir = new RAMDirectory();
Analyzer analyzer = new CompletionAnalyzer(new StandardAnalyzer());
IndexWriterConfig config = new IndexWriterConfig(analyzer);
Codec codec = new Lucene87Codec() {
#Override
public PostingsFormat getPostingsFormatForField(String field) {
if (field.equals("suggest")) {
return new Completion84PostingsFormat();
}
return super.getPostingsFormatForField(field);
}
};
config.setCodec(codec);
IndexWriter indexWriter = new IndexWriter(dir, config);

Auto Suggestion not working in Lucene after first search iteration

Currently I am working on the auto suggestion part using lucene in my application. The Auto suggestion of the words are working fine in console application but now i have integerated to the web application but it's not working the desired way.
When the documents are search for the first time with some keywords search and auto suggestion both are working fine and showing the result. But when i search again for some other keyword or same keyword both the auto suggestion as well as Search result are not showing. I am not able to figure out why this weird result is coming.
The snippets for the auto suggestion as well as search are as follows:
final int HITS_PER_PAGE = 20;
final String RICH_DOCUMENT_PATH = "F:\\Sample\\SampleRichDocuments";
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
String searchText = request.getParameter("search_text");
BooleanQuery.Builder booleanQuery = null;
Query textQuery = null;
Query fileNameQuery = null;
try {
textQuery = new QueryParser("content", new StandardAnalyzer()).parse(searchText);
fileNameQuery = new QueryParser("title", new StandardAnalyzer()).parse(searchText);
booleanQuery = new BooleanQuery.Builder();
booleanQuery.add(textQuery, BooleanClause.Occur.SHOULD);
booleanQuery.add(fileNameQuery, BooleanClause.Occur.SHOULD);
} catch (ParseException e) {
e.printStackTrace();
}
Directory index = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(HITS_PER_PAGE);
try{
searcher.search(booleanQuery.build(), collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for (ScoreDoc hit : hits) {
Document doc = reader.document(hit.doc);
}
// Auto Suggestion of the data
Dictionary dictionary = new LuceneDictionary(reader, "content");
AnalyzingInfixSuggester analyzingSuggester = new AnalyzingInfixSuggester(index, new StandardAnalyzer());
analyzingSuggester.build(dictionary);
List<LookupResult> lookupResultList = analyzingSuggester.lookup(searchText, false, 10);
System.out.println("Look up result size :: "+lookupResultList.size());
for (LookupResult lookupResult : lookupResultList) {
System.out.println(lookupResult.key+" --- "+lookupResult.value);
}
analyzingSuggester.close();
reader.close();
}catch(IOException e){
e.printStackTrace();
}
For ex:
In first iteration if i search for word "sample"
Auto suggestion gives me result: sample, samples, sampler etc. (These are the words in the documents)
Search Result as : sample
But if i search it again with same text or different it's showing no result and also LookUpResult list size is coming Zero.
I am not getting why this is happening. Please help
Below is the updated code for the index creation from set of documents.
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
long startTime = System.currentTimeMillis();
List<ContentHandler> contentHandlerList = new ArrayList<ContentHandler> ();
String fileNames = (String)request.getAttribute("message");
File file = new File("F:\\Sample\\SampleRichDocuments"+fileNames);
ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file);
Metadata metadata = new Metadata();
// Parsing the Rich document set with Apache Tikka
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
InputStream stream = new FileInputStream(file);
try {
parser.parse(stream, handler, metadata, context);
contentHandlerList.add(handler);
}catch (TikaException e) {
e.printStackTrace();
}catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
try {
stream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStored(true);
Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(directory, conf);
Iterator<ContentHandler> handlerIterator = contentHandlerList.iterator();
Iterator<File> fileIterator = fileList.iterator();
Date date = new Date();
while (handlerIterator.hasNext() && fileIterator.hasNext()) {
Document doc = new Document();
String text = handlerIterator.next().toString();
String textFileName = fileIterator.next().getName();
String fileName = textFileName.replaceAll("_", " ");
fileName = fileName.replaceAll("-", " ");
fileName = fileName.replaceAll("\\.", " ");
String fileNameArr[] = fileName.split("\\s+");
for(String contentTitle : fileNameArr){
Field titleField = new Field("title",contentTitle,fieldType);
titleField.setBoost(2.0f);
doc.add(titleField);
}
if(fileNameArr.length > 0){
fileName = fileNameArr[0];
}
String document_id= UUID.randomUUID().toString();
FieldType documentFieldType = new FieldType();
documentFieldType.setStored(false);
Field idField = new Field("document_id",document_id, documentFieldType);
Field fileNameField = new Field("file_name", textFileName, fieldType);
Field contentField = new Field("content",text,fieldType);
doc.add(idField);
doc.add(contentField);
doc.add(fileNameField);
writer.addDocument(doc);
analyzer.close();
}
writer.commit();
writer.deleteUnusedFiles();
long endTime = System.currentTimeMillis();
writer.close();
Also i have observed that from second search iteration the files in the index directory are getting deleted and only the file with .segment suffix is getting changes like .segmenta, .segmentb, .segmentc etc..
I dont know why this weird situation is happening.
your code looks pretty straightforward. So, I am sensing that you might facing this problem because something is going wrong with your indexes, providing the information about how you are building indexes might help to diagnose.
But exact code this time :)
I think your problem is with writer.deleteUnusedFiles() call.
According to JavaDocs, this call can "delete unreferenced index commits".
What indexes to delete is driven by IndexDeletionPolicy.
However "The default deletion policy is KeepOnlyLastCommitDeletionPolicy, which always removes old commits as soon as a new commit is done (this matches the behavior before 2.2).".
It also talks about "delete on last close", which means once this index is used and closed(e.g. during search), that index will be deleted.
So all indexes that matched your first search result will be deleted immediately.
Try this:
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
conf.setIndexDeletionPolicy(NoDeletionPolicy.INSTANCE);

How to index an query country codes with lucene?

I'm creating a lucene index for citynames and countrycodes (depending on each other). I want to countrycodes to be lowercase searchable and exact match.
At first, I now try to query a single countrycode and find all indexed elements that match that code. By my result is always empty.
//prepare
VERSION = Version.LUCENE_4_9;
IndexWriterConfig config = new IndexWriterConfig(VERSION, new SimpleAnalyzer());
//index
Document doc = new Document();
doc.add(new StringField("countryCode", countryCode, Field.Store.YES));
writer.addDocument(doc);
//lookup
Query query = new QueryParser(VERSION, "countryCode", new SimpleAnalyzer()).parse(countryCode);
Result:
when I query for coutrycodes like "IT", "DE", "EN" etc, the result is always empty. Why?
Is SimpleAnalyzer from for 2-letter countrycodes?
For StringField, you can use TermQuery instead of QueryParser
Directory dir = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_9, new SimpleAnalyzer(Version.LUCENE_4_9));
IndexWriter writer = new IndexWriter(dir, config);
String countryCode = "DE";
// index
Document doc = new Document();
doc.add(new StringField("countryCode", countryCode, Store.YES));
writer.addDocument(doc);
writer.close();
IndexSearcher search = new IndexSearcher(DirectoryReader.open(dir));
//lookup
Query query = new TermQuery(new Term("countryCode", countryCode));
TopDocs docs = search.search(query, 1);
System.out.println(docs.totalHits);
I'm a bit confused here. I'll assume the your index writer is initialized in some part of your code not provided, but shy aren't you passing in Version into SimpleAnalyzer? There is no no arg constructor for SimpleAnalyzer, not since 3.X, anyway.
That's the only real issue I see. Here is a working example using your code:
private static Version VERSION;
public static void main(String[] args) throws IOException, ParseException {
//prepare
VERSION = Version.LUCENE_4_9;
Directory dir = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(VERSION, new SimpleAnalyzer(VERSION));
IndexWriter writer = new IndexWriter(dir, config);
String countryCode = "DE";
//index
Document doc = new Document();
doc.add(new TextField("countryCode", countryCode, Field.Store.YES));
writer.addDocument(doc);
writer.close();
IndexSearcher search = new IndexSearcher(DirectoryReader.open(dir));
//lookup
Query query = new QueryParser(VERSION, "countryCode", new SimpleAnalyzer(VERSION)).parse(countryCode);
TopDocs docs = search.search(query, 1);
System.out.println(docs.totalHits);
}

Why doesn't Lucene find any documents with this code?

I am working on this piece of code which add a single document to a lucene (4.7) index and then try to find it by quering a term that exists in the document for sure. But indexSearcher doesn't return any document. What is wrong with my code? Thank you for your comments and feedbacks.
String indexDir = "/home/richard/luc_index_03";
try {
Directory directory = new SimpleFSDirectory(new File(
indexDir));
Analyzer analyzer = new SimpleAnalyzer(
Version.LUCENE_47);
IndexWriterConfig conf = new IndexWriterConfig(
Version.LUCENE_47, analyzer);
conf.setOpenMode(OpenMode.CREATE_OR_APPEND);
conf.setRAMBufferSizeMB(256.0);
IndexWriter indexWriter = new IndexWriter(
directory, conf);
Document doc = new Document();
String title="New York is an awesome city to live!";
doc.add(new StringField("title", title, StringField.Store.YES));
indexWriter.addDocument(doc);
indexWriter.commit();
indexWriter.close();
directory.close();
IndexReader reader = DirectoryReader
.open(FSDirectory.open(new File(
indexDir)));
IndexSearcher indexSearcher = new IndexSearcher(
reader);
String field="title";
SimpleQueryParser qParser = new SimpleQueryParser(analyzer, field);
String queryText="New York" ;
Query query = qParser.parse(queryText);
int hitsPerPage = 100;
TopDocs results = indexSearcher.search(query, 5 * hitsPerPage);
System.out.println("number of results: "+results.totalHits);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
for (ScoreDoc scoreDoc:hits){
Document docC = indexSearcher.doc(scoreDoc.doc);
String path = docC.get("path");
String titleC = docC.get("title");
String ne = docC.get("ne");
System.out.println(path+"\n"+titleC+"\n"+ne);
System.out.println("---*****----");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
After running I just get
number of results: 0
This is because you use StringField. From the javadoc:
A field that is indexed but not tokenized: the entire String value is indexed as a single token.
Just use TextField instead and you should be ok.

Lucene - search does not return anything

I am using Lucene to search. Here is the code-
RAMDirectory index = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer);
IndexWriter w = new IndexWriter(index, config);
while(contentResutlset.next()){
System.out.println("Indexing Content no.(ID) " + contentResutlset.getString(1));
Document doc = new Document();
doc.add(new Field("uniquename",contentResutlset.getString(1),Store.YES,Index.ANALYZED));
doc.add(new Field("type",contentResutlset.getString(2),Store.YES,Index.ANALYZED));
doc.add(new Field("key",contentResutlset.getString(3),Store.YES,Index.ANALYZED));
doc.add(new Field("value",contentResutlset.getString(4),Store.YES,Index.ANALYZED));
w.addDocument(doc);
}
w.close();
contentResutlset.close();
statement.close();
connection.close();
Query q = new QueryParser(Version.LUCENE_34, "value", analyzer).parse("wordtosearch");
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index, true);
ScoreDoc[] topdocs = searcher.search(q, 1000).scoreDocs;
topdocs.length is 0.
What is wrong above?
And how can i change the above to use store the index in database instead of RAMDirectory?
Should I use JDBCDirectory?

Categories

Resources