Auto Suggestion not working in Lucene after first search iteration - java

Currently I am working on the auto suggestion part using lucene in my application. The Auto suggestion of the words are working fine in console application but now i have integerated to the web application but it's not working the desired way.
When the documents are search for the first time with some keywords search and auto suggestion both are working fine and showing the result. But when i search again for some other keyword or same keyword both the auto suggestion as well as Search result are not showing. I am not able to figure out why this weird result is coming.
The snippets for the auto suggestion as well as search are as follows:
final int HITS_PER_PAGE = 20;
final String RICH_DOCUMENT_PATH = "F:\\Sample\\SampleRichDocuments";
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
String searchText = request.getParameter("search_text");
BooleanQuery.Builder booleanQuery = null;
Query textQuery = null;
Query fileNameQuery = null;
try {
textQuery = new QueryParser("content", new StandardAnalyzer()).parse(searchText);
fileNameQuery = new QueryParser("title", new StandardAnalyzer()).parse(searchText);
booleanQuery = new BooleanQuery.Builder();
booleanQuery.add(textQuery, BooleanClause.Occur.SHOULD);
booleanQuery.add(fileNameQuery, BooleanClause.Occur.SHOULD);
} catch (ParseException e) {
e.printStackTrace();
}
Directory index = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexReader reader = DirectoryReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(HITS_PER_PAGE);
try{
searcher.search(booleanQuery.build(), collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
for (ScoreDoc hit : hits) {
Document doc = reader.document(hit.doc);
}
// Auto Suggestion of the data
Dictionary dictionary = new LuceneDictionary(reader, "content");
AnalyzingInfixSuggester analyzingSuggester = new AnalyzingInfixSuggester(index, new StandardAnalyzer());
analyzingSuggester.build(dictionary);
List<LookupResult> lookupResultList = analyzingSuggester.lookup(searchText, false, 10);
System.out.println("Look up result size :: "+lookupResultList.size());
for (LookupResult lookupResult : lookupResultList) {
System.out.println(lookupResult.key+" --- "+lookupResult.value);
}
analyzingSuggester.close();
reader.close();
}catch(IOException e){
e.printStackTrace();
}
For ex:
In first iteration if i search for word "sample"
Auto suggestion gives me result: sample, samples, sampler etc. (These are the words in the documents)
Search Result as : sample
But if i search it again with same text or different it's showing no result and also LookUpResult list size is coming Zero.
I am not getting why this is happening. Please help
Below is the updated code for the index creation from set of documents.
final String INDEX_DIRECTORY = "F:\\Sample\\LuceneIndexer";
long startTime = System.currentTimeMillis();
List<ContentHandler> contentHandlerList = new ArrayList<ContentHandler> ();
String fileNames = (String)request.getAttribute("message");
File file = new File("F:\\Sample\\SampleRichDocuments"+fileNames);
ArrayList<File> fileList = new ArrayList<File>();
fileList.add(file);
Metadata metadata = new Metadata();
// Parsing the Rich document set with Apache Tikka
ContentHandler handler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
Parser parser = new AutoDetectParser();
InputStream stream = new FileInputStream(file);
try {
parser.parse(stream, handler, metadata, context);
contentHandlerList.add(handler);
}catch (TikaException e) {
e.printStackTrace();
}catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally {
try {
stream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
FieldType fieldType = new FieldType();
fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
fieldType.setStoreTermVectors(true);
fieldType.setStoreTermVectorPositions(true);
fieldType.setStoreTermVectorPayloads(true);
fieldType.setStoreTermVectorOffsets(true);
fieldType.setStored(true);
Analyzer analyzer = new StandardAnalyzer();
Directory directory = FSDirectory.open(new File(INDEX_DIRECTORY).toPath());
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
IndexWriter writer = new IndexWriter(directory, conf);
Iterator<ContentHandler> handlerIterator = contentHandlerList.iterator();
Iterator<File> fileIterator = fileList.iterator();
Date date = new Date();
while (handlerIterator.hasNext() && fileIterator.hasNext()) {
Document doc = new Document();
String text = handlerIterator.next().toString();
String textFileName = fileIterator.next().getName();
String fileName = textFileName.replaceAll("_", " ");
fileName = fileName.replaceAll("-", " ");
fileName = fileName.replaceAll("\\.", " ");
String fileNameArr[] = fileName.split("\\s+");
for(String contentTitle : fileNameArr){
Field titleField = new Field("title",contentTitle,fieldType);
titleField.setBoost(2.0f);
doc.add(titleField);
}
if(fileNameArr.length > 0){
fileName = fileNameArr[0];
}
String document_id= UUID.randomUUID().toString();
FieldType documentFieldType = new FieldType();
documentFieldType.setStored(false);
Field idField = new Field("document_id",document_id, documentFieldType);
Field fileNameField = new Field("file_name", textFileName, fieldType);
Field contentField = new Field("content",text,fieldType);
doc.add(idField);
doc.add(contentField);
doc.add(fileNameField);
writer.addDocument(doc);
analyzer.close();
}
writer.commit();
writer.deleteUnusedFiles();
long endTime = System.currentTimeMillis();
writer.close();
Also i have observed that from second search iteration the files in the index directory are getting deleted and only the file with .segment suffix is getting changes like .segmenta, .segmentb, .segmentc etc..
I dont know why this weird situation is happening.

your code looks pretty straightforward. So, I am sensing that you might facing this problem because something is going wrong with your indexes, providing the information about how you are building indexes might help to diagnose.
But exact code this time :)

I think your problem is with writer.deleteUnusedFiles() call.
According to JavaDocs, this call can "delete unreferenced index commits".
What indexes to delete is driven by IndexDeletionPolicy.
However "The default deletion policy is KeepOnlyLastCommitDeletionPolicy, which always removes old commits as soon as a new commit is done (this matches the behavior before 2.2).".
It also talks about "delete on last close", which means once this index is used and closed(e.g. during search), that index will be deleted.
So all indexes that matched your first search result will be deleted immediately.
Try this:
IndexWriterConfig conf = new IndexWriterConfig(analyzer);
conf.setIndexDeletionPolicy(NoDeletionPolicy.INSTANCE);

Related

TermQuery not giving expected result as QueryParser - Lucene 7.4.0

I am indexing 10 text documents using StandardAnalyser.
public static void indexDoc(final IndexWriter writer, Path filePath, long timstamp)
{
try (InputStream iStream = Files.newInputStream(filePath))
{
Document doc = new Document();
Field pathField = new StringField("path",filePath.toString(),Field.Store.YES);
Field flagField = new TextField("ashish","i am stored",Field.Store.YES);
LongPoint last_modi = new LongPoint("last_modified",timstamp);
Field content = new TextField("content",new BufferedReader(new InputStreamReader(iStream,StandardCharsets.UTF_8)));
doc.add(pathField);
doc.add(last_modi);
doc.add(content);
doc.add(flagField);
if(writer.getConfig().getOpenMode()==OpenMode.CREATE)
{
System.out.println("Adding "+filePath.toString());
writer.addDocument(doc);
}
} catch (IOException e) {
e.printStackTrace();
}
}
above is the code snippet used to index a document.
for testing purpose, i am searching a field called as 'ashish'.
When I use QueryParser, Lucene gives the search results as expected.
public static void main(String[] args) throws Exception
{
String index = "E:\\Lucene\\Index";
String field = "ashish";
int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
String line = "i am stored";
Query query = parser.parse(line);
// Query q = new TermQuery(new Term("ashish","i am stored"));
System.out.println("Searching for: " + query.toString());
TopDocs results = searcher.search(query, 5 * hitsPerPage);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = Math.toIntExact(results.totalHits);
System.out.println(numTotalHits + " total matching documents");
for(int i=0;i<numTotalHits;i++)
{
Document doc = searcher.doc(hits[i].doc);
String path = doc.get("path");
String content = doc.get("ashish");
System.out.println(path+"\n"+content);
}
}
above code demonstrates the use of QueryParser to retrieve the desired field, which works properly. it hits all 10 documents, as i am storing this field for all 10 documents. all good here.
however when I use TermQuery API, I don't get the desired result.
I am presenting the code change that I did for TermQuery.
public static void main(String[] args) throws Exception
{
String index = "E:\\Lucene\\Index";
String field = "ashish";
int hitsPerPage = 10;
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
// QueryParser parser = new QueryParser(field, analyzer);
String line = "i am stored";
// Query query = parser.parse(line);
Query q = new TermQuery(new Term("ashish","i am stored"));
System.out.println("Searching for: " + q.toString());
TopDocs results = searcher.search(q, 5 * hitsPerPage);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = Math.toIntExact(results.totalHits);
System.out.println(numTotalHits + " total matching documents");
for(int i=0;i<numTotalHits;i++)
{
Document doc = searcher.doc(hits[i].doc);
String path = doc.get("path");
String content = doc.get("ashish");
System.out.println(path+"\n"+content);
System.out.println("----------------------------------------------------------------------------------");
}
}
also attaching the screenshot of TermQuery API execution.
did some research on stackoverflow itself example Lucene TermQuery and QueryParser but did not find any practical solution also the lucene version was very old in those examples.
would appreciate a help.
thanks in advance!
I got the answer of my question in this post
link that explains how TermQuery works
TermQuery searches for entire String as it is. this behavior will give you improper results as while indexing data is often tokenized.
in the posted code, I was passing entire search String to TermQuery like
Query q = new TermQuery(new Term("ashish","i am stored"));
now in above case, Lucene is finding "i am stored" as it is, which it will never find because in indexing this string is tokenized.
instead I tried to search like Query q = new TermQuery(new Term("ashish","stored"));
Above query gave me an expected results.
thanks,
Ashish
The real problem is your query string is not getting analyzed here. So, use same analyzer as used while indexing document and try using below code to analyze query string and then search.
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("ashish", analyzer);
Query query = new TermQuery(new Term("ashish", "i am stored"));
query = parser.parse(query.toString());
ScoreDoc[] hits = searcher.search(query, 5).scoreDocs;

Jave, Lucene : Search with numbers as String not working

I am working on integrating Lucene in our Spring-MVC based project and currently it's working good, other than search with numbers.
Whenever I try search like 123Ab or 123 or anything which has numbers inside it, I don't get back any search results.
As soon as I remove the numbers though, it works fine.
Any suggestions? Thank you.
Code :
public List<Integer> searchLucene(String text, long groupId, boolean type) {
List<Integer> objectIds = new ArrayList<>();
if (text != null) {
//String specialChars = "+ - && || ! ( ) { } [ ] ^ \" ~ * ? : \\ /";
text = text.replace("+", "\\+");
text = text.replace("-", "\\-");
text = text.replace("&&", "\\&&");
text = text.replace("||", "\\||");
text = text.replace("!", "\\!");
text = text.replace("(", "\\(");
text = text.replace(")", "\\)");
text = text.replace("{", "\\}");
text = text.replace("{", "\\}");
text = text.replace("[", "\\[");
text = text.replace("^", "\\^");
// text = text.replace("\"","\\\"");
text = text.replace("~", "\\~");
text = text.replace("*", "\\*");
text = text.replace("?", "\\?");
text = text.replace(":", "\\:");
//text = text.replace("\\","\\\\");
text = text.replace("/", "\\/");
try {
Path path;
//Set system path code
Directory directory = FSDirectory.open(path);
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);
QueryParser queryParser = new QueryParser("contents", new SimpleAnalyzer());
Query query;
query = queryParser.parse(text+"*");
TopDocs topDocs = indexSearcher.search(query, 50);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
org.apache.lucene.document.Document document = indexSearcher.doc(scoreDoc.doc);
objectIds.add(Integer.valueOf(document.get("id")));
System.out.println("");
System.out.println("id " + document.get("id"));
System.out.println("content " + document.get("contents"));
}
indexSearcher.getIndexReader().close();
directory.close();
return objectIds;
} catch (Exception ignored) {
}
}
return null;
}
Indexing code :
#Override
public void saveIndexes(String text, String tagFileName, String filePath, long groupId, boolean type, int objectId) {
try {
//indexing directory
File testDir;
Path path1;
Directory index_dir;
if (type) {
// System path code
Directory directory = org.apache.lucene.store.FSDirectory.open(path);
IndexWriterConfig config = new IndexWriterConfig(new SimpleAnalyzer());
IndexWriter indexWriter = new IndexWriter(directory, config);
org.apache.lucene.document.Document doc = new org.apache.lucene.document.Document();
if (filePath != null) {
File file = new File(filePath); // current directory
doc.add(new TextField("path", file.getPath(), Field.Store.YES));
}
doc.add(new StringField("id", String.valueOf(objectId), Field.Store.YES));
// doc.add(new TextField("id",String.valueOf(objectId),Field.Store.YES));
if (text == null) {
if (filePath != null) {
FileInputStream is = new FileInputStream(filePath);
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder stringBuffer = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
stringBuffer.append(line).append("\n");
}
stringBuffer.append("\n").append(tagFileName);
reader.close();
doc.add(new TextField("contents", stringBuffer.toString(), Field.Store.YES));
}
} else {
text = text + "\n" + tagFileName;
doc.add(new TextField("contents", text, Field.Store.YES));
}
indexWriter.addDocument(doc);
indexWriter.commit();
indexWriter.flush();
indexWriter.close();
directory.close();
} catch (Exception ignored) {
}
}
I have tried with and without wildcard i.e *. Thank you.
Issue is in your indexing code.
Your field contents is a TextField and you are using a SimpleAnalyzer so if you see SimpleAnalyzer documentation, it says ,
An Analyzer that filters LetterTokenizer with LowerCaseFilter
So that means for your field, if it is set to tokenized numbers will be removed.
Now look at , TextField code, here a TextField is always tokenized irrespective of it being TYPE_STORED or TYPE_NOT_STORED.
So if you wish to index letters and numbers, you need to use a StringField instead of a TextField.
StringField documentation,
A field that is indexed but not tokenized: the entire String value is
indexed as a single token. For example this might be used for a
'country' field or an 'id' field, or any field that you intend to use
for sorting or access through the field cache.
A StringField is never tokenized irrespective of it being TYPE_STORED or TYPE_NOT_STORED
So after indexing, numbers are removed from contents field and is indexed without numbers so you don't find those patterns while searching.
Instead of QueryParser and doing complicated searches, first use a query like below to first verify your indexed Terms,
Query wildcardQuery = new WildcardQuery(new Term("contents", searchString));
TopDocs hits = searcher.search(wildcardQuery, 20);
Also, to know if debugging to be focused on indexer side or searcher side , use Luke Tool to see if terms are created as per your need. If terms are there, you can focus on searcher code.

can't delete document index in luence [duplicate]

This question already has an answer here:
can't delete document with lucene IndexWriter.deleteDocuments(term)
(1 answer)
Closed 6 years ago.
I build a search index for luence like this:
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Field tagField = new Field("tag", joinListStr(gifModel.getTags()), Field.Store.YES, Field.Index.ANALYZED);
Field textField = new Field("text", gifModel.getText(), Field.Store.NO, Field.Index.ANALYZED);
doc.add(idField);
doc.add(tagField);
doc.add(textField);
iwriter.addDocument(doc);
I want to delete that document by Term via the _id field acroding to this article:
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexReader indexReader = IndexReader.open(directory);
Term term = new Term("_id", id);
int num = indexReader.deleteDocuments(term);
indexReader.close();
return new ReturnMap(num);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}
But here num allways is 0 and search result shows the document still in the search index, what have I missing?
EDIT
change the indexReader to indexWriter still not working
public Map<String, Object> deleteIndexByMongoId(String id) {
try {
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_CURRENT, new SmartChineseAnalyzer(Version.LUCENE_CURRENT));
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
indexWriter.close();
return new ReturnMap(0);
}catch (IOException e){
e.printStackTrace();
return new ReturnMap(GifError.S_DELETE_INDEX_ERR, "delete index error");
}
}
What version of Lucene are you using?? IndexReader.deleteDocuments no longer exists. It was depricated after Lucene 3.6. Either ways use the IndexWriter class.
Directory directory = FSDirectory.open(new File(GifMiaoMacro.LUCENE_INDEX_FILE));
IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new SimpleAnalyzer());
IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);
Term term = new Term("_id", id);
indexWriter.deleteDocuments(term);
IndexWriter.deletedocuments(term)
Field idField = new Field("_id", "58369c7e0293a47b09d34605", Field.Store.YES, Field.Index.NO);
Seems you have made the id field unindexable. So it cannot be searched, even if it is stored. You will have to use a field that is searchable from the index.

Why doesn't Lucene find any documents with this code?

I am working on this piece of code which add a single document to a lucene (4.7) index and then try to find it by quering a term that exists in the document for sure. But indexSearcher doesn't return any document. What is wrong with my code? Thank you for your comments and feedbacks.
String indexDir = "/home/richard/luc_index_03";
try {
Directory directory = new SimpleFSDirectory(new File(
indexDir));
Analyzer analyzer = new SimpleAnalyzer(
Version.LUCENE_47);
IndexWriterConfig conf = new IndexWriterConfig(
Version.LUCENE_47, analyzer);
conf.setOpenMode(OpenMode.CREATE_OR_APPEND);
conf.setRAMBufferSizeMB(256.0);
IndexWriter indexWriter = new IndexWriter(
directory, conf);
Document doc = new Document();
String title="New York is an awesome city to live!";
doc.add(new StringField("title", title, StringField.Store.YES));
indexWriter.addDocument(doc);
indexWriter.commit();
indexWriter.close();
directory.close();
IndexReader reader = DirectoryReader
.open(FSDirectory.open(new File(
indexDir)));
IndexSearcher indexSearcher = new IndexSearcher(
reader);
String field="title";
SimpleQueryParser qParser = new SimpleQueryParser(analyzer, field);
String queryText="New York" ;
Query query = qParser.parse(queryText);
int hitsPerPage = 100;
TopDocs results = indexSearcher.search(query, 5 * hitsPerPage);
System.out.println("number of results: "+results.totalHits);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
for (ScoreDoc scoreDoc:hits){
Document docC = indexSearcher.doc(scoreDoc.doc);
String path = docC.get("path");
String titleC = docC.get("title");
String ne = docC.get("ne");
System.out.println(path+"\n"+titleC+"\n"+ne);
System.out.println("---*****----");
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
After running I just get
number of results: 0
This is because you use StringField. From the javadoc:
A field that is indexed but not tokenized: the entire String value is indexed as a single token.
Just use TextField instead and you should be ok.

Lucene Search Returns no results when the file contents are saved

I am trying to develop a log querying system using apache lucene. I have developed a demo code to index two files and then search for the query string.
The first file contains the data
maclean
the second file contains the data
pinto
Bellow is the code that I have used for indexing
fis = new FileInputStream(file);
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
Document doc = new Document();
Document doc = new Document();
doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8"))));
doc.add(new StoredField("filename", file.getCanonicalPath()));
if (indexWriter.getConfig().getOpenMode() == OpenMode.CREATE) {
System.out.println("adding " + file);
indexWriter.addDocument(doc);
} else {
System.out.println("updating " + file);
indexWriter.updateDocument(new Term("path", file.getPath()), doc);
}
If i use this code then i get the proffer result. But in display i can show only the file name since i have stored the only the file name.
So i modified the code and stored the file contents as well using this code
FileInputStream fis = null;
if (file.isHidden() || file.isDirectory() || !file.canRead() || !file.exists()) {
return;
}
if (suffix!=null && !file.getName().endsWith(suffix)) {
return;
}
System.out.println("Indexing file " + file.getCanonicalPath());
try {
fis = new FileInputStream(file);
} catch (FileNotFoundException fnfe) {
System.out.println("File Not Found"+fnfe);
}
DataInputStream in = new DataInputStream(fis);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
String Data="";
while ((strLine = br.readLine()) != null)
{
Data=Data+strLine;
}
Document doc = new Document();
doc.add(new TextField("contents", Data, Field.Store.YES));
doc.add(new StoredField("filename", file.getCanonicalPath()));
if (indexWriter.getConfig().getOpenMode() == OpenMode.CREATE) {
System.out.println("adding " + file);
indexWriter.addDocument(doc);
} else {
System.out.println("updating " + file);
indexWriter.updateDocument(new Term("path", file.getPath()), doc);
}
According to my understanding i should get the number of results as 1. and It should show the file name and content of the file containing maclean
But instead i get the result as
-----------------------Results--------------------------
0 total matching documents
Found 0
Is there anything wrong that i am doing in the code or there is a logical explanation to this? Why does the first code works and second doesn't work?
Search query Code
try
{
Directory directory = FSDirectory.open(indexDir);
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_41);
QueryParser parser = new QueryParser(Version.LUCENE_41, "contents", analyzer);
Query query = parser.parse(queryStr);
System.out.println("Searching for: " + query.toString("contents"));
TopDocs results = searcher.search(query, maxHits);
ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
System.out.println("\n\n\n-----------------------Results--------------------------\n\n\n");
System.out.println(numTotalHits + " total matching documents");
for (int i = 0; i < numTotalHits; i++) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(i+":File name is: "+d.get("filename"));
System.out.println(i+":File content is: "+d.get("contents"));
}
System.out.println("Found " + numTotalHits);
}
catch(Exception e)
{
System.out.println("Exception Was caused in SimpleSearcher");
e.printStackTrace();
}
Use StoredField instead of TextField
doc.add(new StoredField("Data",Line));
When you use Text Field the string gets tokenized and as a result you will not be able to search for the same. Stored Field stores the entire string without tokenizing it.
I think your exact problem, is that by the time you get to creating a BufferedReader for the indexed field, you have already read the whole file, and the stream is at the end of the file, with nothing further to read. You should be able to fix that with a call to fis.reset();
However, you should not do that. Don't store the same data in two separate fields, one for indexing and one for storage. Instead, set the same field to both store and index the data. TextField has a ctor that allows you to store the data as well as index, something like:
doc.add(new TextField("contents", Data, Field.Store.YES));
I think there could be two problems with your code.
First, I notice that you did not user near-real-time search and did not commit the writer before reading as well. Lucene's IndexReader takes a snapshot of the index, either the committed version when NRT is not used or both committed and uncommitted version when NRT is used. That could be the reason that your IndexReader fails to see the change. As it seems you require concurrent reading and writing, I recommend you use NRT search (IndexReader reader = DirectoryReader.open(indexWriter);)
The second problem could be that, as #femtoRgon said, the data you stored may not what you expect. I notice that, when you append the content of your file for storage, you seem to lose EOL characters. I suggest you use Luke to check your index http://www.getopt.org/luke/
This works in Lucene 4.5: doc.add(new TextField("Data", Data, Field.Store.YES));

Categories

Resources