I'm creating a lucene index for citynames and countrycodes (depending on each other). I want to countrycodes to be lowercase searchable and exact match.
At first, I now try to query a single countrycode and find all indexed elements that match that code. By my result is always empty.
//prepare
VERSION = Version.LUCENE_4_9;
IndexWriterConfig config = new IndexWriterConfig(VERSION, new SimpleAnalyzer());
//index
Document doc = new Document();
doc.add(new StringField("countryCode", countryCode, Field.Store.YES));
writer.addDocument(doc);
//lookup
Query query = new QueryParser(VERSION, "countryCode", new SimpleAnalyzer()).parse(countryCode);
Result:
when I query for coutrycodes like "IT", "DE", "EN" etc, the result is always empty. Why?
Is SimpleAnalyzer from for 2-letter countrycodes?
For StringField, you can use TermQuery instead of QueryParser
Directory dir = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_9, new SimpleAnalyzer(Version.LUCENE_4_9));
IndexWriter writer = new IndexWriter(dir, config);
String countryCode = "DE";
// index
Document doc = new Document();
doc.add(new StringField("countryCode", countryCode, Store.YES));
writer.addDocument(doc);
writer.close();
IndexSearcher search = new IndexSearcher(DirectoryReader.open(dir));
//lookup
Query query = new TermQuery(new Term("countryCode", countryCode));
TopDocs docs = search.search(query, 1);
System.out.println(docs.totalHits);
I'm a bit confused here. I'll assume the your index writer is initialized in some part of your code not provided, but shy aren't you passing in Version into SimpleAnalyzer? There is no no arg constructor for SimpleAnalyzer, not since 3.X, anyway.
That's the only real issue I see. Here is a working example using your code:
private static Version VERSION;
public static void main(String[] args) throws IOException, ParseException {
//prepare
VERSION = Version.LUCENE_4_9;
Directory dir = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(VERSION, new SimpleAnalyzer(VERSION));
IndexWriter writer = new IndexWriter(dir, config);
String countryCode = "DE";
//index
Document doc = new Document();
doc.add(new TextField("countryCode", countryCode, Field.Store.YES));
writer.addDocument(doc);
writer.close();
IndexSearcher search = new IndexSearcher(DirectoryReader.open(dir));
//lookup
Query query = new QueryParser(VERSION, "countryCode", new SimpleAnalyzer(VERSION)).parse(countryCode);
TopDocs docs = search.search(query, 1);
System.out.println(docs.totalHits);
}
Related
I am trying to implement search suggestions for my app. Actually I need a kind of "multi-term prefix query" and I was trying to use a PrefixCompletionQuery. The problem is that an IllegalArgumentException is thrown when "search" or "suggest" methods are called from a SuggestIndexSearcher object.
I wrote a sample code to reproduce the problem:
public static void main(String[] args) throws IOException {
RAMDirectory dir = new RAMDirectory(); //just for this experiment
Analyzer analyzer = new CompletionAnalyzer(new StandardAnalyzer());
IndexWriter writer = new IndexWriter(dir, new IndexWriterConfig(analyzer));
var doc = new Document();
doc.add(new SuggestField("suggest", "Hi everybody!",4));
writer.addDocument(doc);
doc = new Document();
doc.add(new SuggestField("suggest", "nice to meet you",4));
writer.addDocument(doc);
writer.commit(); // maybe redundant
writer.close();
var reader = DirectoryReader.open(dir);
var searcher = new SuggestIndexSearcher(reader);
var query = new PrefixCompletionQuery(analyzer, new Term("suggest", "everyb"));
TopDocs results = searcher.search(query, 5);
for (var res : results.scoreDocs) {
System.out.println(reader.document(res.doc).get("id"));
}
}
And this is what i get:
Exception in thread "main" java.lang.IllegalArgumentException: suggest is not a SuggestField
at org.apache.lucene.search.suggest.document.CompletionWeight.bulkScorer(CompletionWeight.java:86)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
at experiments.main.main(main.java:67) #TopDocs results = searcher.search(query, 5);
Trying to be as complete as possible, the project depends on lucene-core 8.8.2 and lucene-suggest 8.8.2 .
Where am I wrong?
I think you have to change the posting format of your suggestion field by adding a custom codec to your index writer.
For example something like this:
RAMDirectory dir = new RAMDirectory();
Analyzer analyzer = new CompletionAnalyzer(new StandardAnalyzer());
IndexWriterConfig config = new IndexWriterConfig(analyzer);
Codec codec = new Lucene87Codec() {
#Override
public PostingsFormat getPostingsFormatForField(String field) {
if (field.equals("suggest")) {
return new Completion84PostingsFormat();
}
return super.getPostingsFormatForField(field);
}
};
config.setCodec(codec);
IndexWriter indexWriter = new IndexWriter(dir, config);
I am indexing 10 text documents using StandardAnalyser.
public static void indexDoc(final IndexWriter writer, Path filePath, long timstamp)
{
try (InputStream iStream = Files.newInputStream(filePath))
{
Document doc = new Document();
Field pathField = new StringField("path",filePath.toString(),Field.Store.YES);
Field flagField = new TextField("ashish","i am stored",Field.Store.YES);
LongPoint last_modi = new LongPoint("last_modified",timstamp);
Field content = new TextField("content",new BufferedReader(new InputStreamReader(iStream,StandardCharsets.UTF_8)));
doc.add(pathField);
doc.add(last_modi);
doc.add(content);
doc.add(flagField);
if(writer.getConfig().getOpenMode()==OpenMode.CREATE)
{
System.out.println("Adding "+filePath.toString());
writer.addDocument(doc);
}
} catch (IOException e) {
e.printStackTrace();
}
}
above is the code snippet used to index a document.
for testing purpose, i am searching a field called as 'ashish'.
When I use QueryParser, Lucene gives the search results as expected.
public static void main(String[] args) throws Exception
{
String index = "E:\\Lucene\\Index";
String field = "ashish";
int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
String line = "i am stored";
Query query = parser.parse(line);
// Query q = new TermQuery(new Term("ashish","i am stored"));
System.out.println("Searching for: " + query.toString());
TopDocs results = searcher.search(query, 5 * hitsPerPage);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = Math.toIntExact(results.totalHits);
System.out.println(numTotalHits + " total matching documents");
for(int i=0;i<numTotalHits;i++)
{
Document doc = searcher.doc(hits[i].doc);
String path = doc.get("path");
String content = doc.get("ashish");
System.out.println(path+"\n"+content);
}
}
above code demonstrates the use of QueryParser to retrieve the desired field, which works properly. it hits all 10 documents, as i am storing this field for all 10 documents. all good here.
however when I use TermQuery API, I don't get the desired result.
I am presenting the code change that I did for TermQuery.
public static void main(String[] args) throws Exception
{
String index = "E:\\Lucene\\Index";
String field = "ashish";
int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
// QueryParser parser = new QueryParser(field, analyzer);
String line = "i am stored";
// Query query = parser.parse(line);
Query q = new TermQuery(new Term("ashish","i am stored"));
System.out.println("Searching for: " + q.toString());
TopDocs results = searcher.search(q, 5 * hitsPerPage);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = Math.toIntExact(results.totalHits);
System.out.println(numTotalHits + " total matching documents");
for(int i=0;i<numTotalHits;i++)
{
Document doc = searcher.doc(hits[i].doc);
String path = doc.get("path");
String content = doc.get("ashish");
System.out.println(path+"\n"+content);
System.out.println("----------------------------------------------------------------------------------");
}
}
also attaching the screenshot of TermQuery API execution.
did some research on stackoverflow itself example Lucene TermQuery and QueryParser but did not find any practical solution also the lucene version was very old in those examples.
would appreciate a help.
thanks in advance!
I got the answer of my question in this post
link that explains how TermQuery works
TermQuery searches for entire String as it is. this behavior will give you improper results as while indexing data is often tokenized.
in the posted code, I was passing entire search String to TermQuery like
Query q = new TermQuery(new Term("ashish","i am stored"));
now in above case, Lucene is finding "i am stored" as it is, which it will never find because in indexing this string is tokenized.
instead I tried to search like Query q = new TermQuery(new Term("ashish","stored"));
Above query gave me an expected results.
thanks,
Ashish
The real problem is your query string is not getting analyzed here. So, use same analyzer as used while indexing document and try using below code to analyze query string and then search.
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("ashish", analyzer);
Query query = new TermQuery(new Term("ashish", "i am stored"));
query = parser.parse(query.toString());
ScoreDoc[] hits = searcher.search(query, 5).scoreDocs;
This question already has an answer here:
can't delete document with lucene IndexWriter.deleteDocuments(term)
(1 answer)
Closed 6 years ago.
I build a search index for luence like this:
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Field tagField = new Field("tag", joinListStr(gifModel.getTags()), Field.Store.YES, Field.Index.ANALYZED);
Field textField = new Field("text", gifModel.getText(), Field.Store.NO, Field.Index.ANALYZED);
doc.add(idField);
doc.add(tagField);
doc.add(textField);
iwriter.addDocument(doc);
I want to delete that document by Term via the _id field acroding to this article:
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexReader indexReader = IndexReader.open(directory);
Term term = new Term("_id", id);
int num = indexReader.deleteDocuments(term);
indexReader.close();
return new ReturnMap(num);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}
But here num allways is 0 and search result shows the document still in the search index, what have I missing?
EDIT
change the indexReader to indexWriter still not working
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_CURRENT, new SmartChineseAnalyzer(Version.LUCENE_CURRENT));
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
indexWriter.close();
return new ReturnMap(0);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}
What version of Lucene are you using?? IndexReader.deleteDocuments no longer exists. It was depricated after Lucene 3.6. Either ways use the IndexWriter class.
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new SimpleAnalyzer());
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
IndexWriter.deletedocuments(term)
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Seems you have made the id field unindexable. So it cannot be searched, even if it is stored. You will have to use a field that is searchable from the index.
I am working on this piece of code which add a single document to a lucene (4.7) index and then try to find it by quering a term that exists in the document for sure. But indexSearcher doesn't return any document. What is wrong with my code? Thank you for your comments and feedbacks.
String indexDir = "/home/richard/luc_index_03";
try {
Directory directory = new SimpleFSDirectory(new File(
indexDir));
Analyzer analyzer = new SimpleAnalyzer(
Version.LUCENE_47);
IndexWriterConfig conf = new IndexWriterConfig(
Version.LUCENE_47, analyzer);
conf.setOpenMode(OpenMode.CREATE_OR_APPEND);
conf.setRAMBufferSizeMB(256.0);
IndexWriter indexWriter = new IndexWriter(
directory, conf);
Document doc = new Document();
String title="New York is an awesome city to live!";
doc.add(new StringField("title", title, StringField.Store.YES));
indexWriter.addDocument(doc);
indexWriter.commit();
indexWriter.close();
directory.close();
IndexReader reader = DirectoryReader
.open(FSDirectory.open(new File(
indexDir)));
IndexSearcher indexSearcher = new IndexSearcher(
reader);
String field="title";
SimpleQueryParser qParser = new SimpleQueryParser(analyzer, field);
String queryText="New York" ;
Query query = qParser.parse(queryText);
int hitsPerPage = 100;
TopDocs results = indexSearcher.search(query, 5 * hitsPerPage);
System.out.println("number of results: "+results.totalHits);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
for (ScoreDoc scoreDoc:hits){
Document docC = indexSearcher.doc(scoreDoc.doc);
String path = docC.get("path");
String titleC = docC.get("title");
String ne = docC.get("ne");
System.out.println(path+"\n"+titleC+"\n"+ne);
System.out.println("---*****----");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
After running I just get
number of results: 0
This is because you use StringField. From the javadoc:
A field that is indexed but not tokenized: the entire String value is indexed as a single token.
Just use TextField instead and you should be ok.
I am using Lucene to search. Here is the code-
RAMDirectory index = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer);
IndexWriter w = new IndexWriter(index, config);
while(contentResutlset.next()){
System.out.println("Indexing Content no.(ID) " + contentResutlset.getString(1));
Document doc = new Document();
doc.add(new Field("uniquename",contentResutlset.getString(1),Store.YES,Index.ANALYZED));
doc.add(new Field("type",contentResutlset.getString(2),Store.YES,Index.ANALYZED));
doc.add(new Field("key",contentResutlset.getString(3),Store.YES,Index.ANALYZED));
doc.add(new Field("value",contentResutlset.getString(4),Store.YES,Index.ANALYZED));
w.addDocument(doc);
}
w.close();
contentResutlset.close();
statement.close();
connection.close();
Query q = new QueryParser(Version.LUCENE_34, "value", analyzer).parse("wordtosearch");
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index, true);
ScoreDoc[] topdocs = searcher.search(q, 1000).scoreDocs;
topdocs.length is 0.
What is wrong above?
And how can i change the above to use store the index in database instead of RAMDirectory?
Should I use JDBCDirectory?