How to safely close an IndexReader?

How to safely close an IndexReader? - java

My question is very simple, When use IndexReader.openIfChanged (reader) replace the previous reader, How to safely the close oldReader?
Here is the code: (Use Lucene 3.5)
IndexReader newReader=IndexReader.openIfChanged(reader);
if(newReader!=null){
IndexReader oldReader=reader;
IndexSearcher oldSearcher=searcher;
reader=newReader;
searcher=new IndexSearcher(newReader);
oldSearcher.close();
oldReader.close();//or oldReader.decRef(),result is the same
}
This code in a deamon thread,Every 5 seconds run time
IndexReader instance(reader object) is globally unique
Since this change, I get an exception:
org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:297)
at org.apache.lucene.index.IndexReader.getSequentialSubReaders(IndexReader.java:1622)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:98)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:577)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:517)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:487)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:400)
at org.zenofo.index.IndexManager.query(IndexManager.java:392)
...
IndexManager.java:392 using the reader object(IndexReader instance,globally unique)
IndexManager.query method has a large number of concurrent requests, all requests to use a globally unique IndexReader instance (reader object)
I need to close oldReader just because:
Too many open files in Lucene Indexing when number of users
increase
Lucene Wiki:Too many open files
Reference:
API-IndexReader
API-IndexSearcher
How do I solve this problem?

Look at NRTManager and SearcherManager. You really don't have to handle this yourself.

You need to impose the happens-before relationship between writes to the public static vars and subsequent reads of them from other threads. If you use more than one var, you'll have the issue of atomicity so I recommend you use only one var since that is all you need.
Simply, this would work for you:
public class SearcherManager
{
public static volatile IndexSearcher searcher;
private static void reopen() {
// your code, just without assignment to reader
}
}
The key is the volatile modifier. Be sure to fully initialize everything before writing to the var, but do the closing of old objects after the write—in other words, just make sure you go on doing it the way you are doing it now :)
But, as #MJB notes in his answer, you should really not be doing this since it is all built into Lucene. Check out the Javadoc on NRTManagerReopenThread to get all the info you need, including a full code sample.

I assume the searcher (that later refereed as oldSearcher) if working on reader (oldReader), in that case when you close it it close also the reader it use, so you don't need to close it,
oldSearcher.close() is enough.

I don't see at all what oldReader and oldSearcher are doing at all!!!!!
Can't you just remove them along with their close()
If you still need them, then my bet is that the oldSearcher somehow is related to oldReader, so calling close() on oldSearcher also causes closing oldReader that's why you get the exception
Is that the whole bulk of code, or did u simplify it? if yes to to first, then just remove oldReader and oldSearcher altogether
Cheers

Take a look at the indexreader reference counting methods.
i.e Increase the reference count when you are instantiating a new IndexSearcher with
reader.incRef(); and decrease it when you're done with the search results and preferably in a finally statement of a try catch method with reader.decRef();
reader.decRef() automatically closes the reader when the number of references is 0.

Related

JVM not killed on SIGPIPE

What is the reason for the JVM handling SIGPIPE the way it does?
I would've expected for
java foo | head -10
with
public class Foo {
public static void main(String[] args){
Stream.iterate(0, n -> n + 1).forEach(System.out::println);
}
}
to cause the process to be killed when writing the 11th line, however that is not the case. Instead, it seems that only a trouble flag is being set at the PrintStream, which can be checked through System.out.checkError().

What happens is that the SIGPIPE exception results in an IOException.
For most OutputStream and Writer classes, this exception propagates through the "write" method, and has to be handled by the caller.
However, when you are writing to System.out, you are using a PrintStream, and that class by design takes care of the IOException of you. As the javadoc says:
A PrintStream adds functionality to another output stream, namely the ability to print representations of various data values conveniently. Two other features are provided as well. Unlike other output streams, a PrintStream never throws an IOException; instead, exceptional situations merely set an internal flag that can be tested via the checkError method.
What is the reason for the JVM handling SIGPIPE the way it does?
The above explains what is happening. The "why" is ... I guess ... that the designers wanted to make PrintStream easy to use for typical use cases of System.out where the caller doesn't want to deal with a possible IOException on every call.
Unfortunately, there is no elegant solution to this:
You could just call checkError ...
You should be able get hold of the FileDescriptor.out object, and wrap it in a new FileOutputStream object ... and use that instead of System.out.
Note that there are no strong guarantees that the Java app will only write 10 lines of output in java foo | head -1. It is quite possible for the app to write-ahead many lines, and to only "see" the pipe closed after head has gotten around to reading the first 10 of them. This applies with System.out (and checkError) or if you wrap FileDescriptor.

Java - Append to files in loop or print entire String

The following Code describes my problem:
private void transact(TreeSet<BankmanagerTransaction> set) {
BankmanagerTransaction transaction;
while(!set.isEmpty()) {
transaction = set.first();
execute(transaction);
printBalance(transaction);
printLedger(transaction);
printJustifiedLedger(transaction);
}
}
Every print function prints to a different File. So I'm wondering what's best practice here.
Is it better to build string and then at the end of the transact method print everything at once, or write to the file line by line?
To make more clear what I'm trying to get at, is the time the file is being edited and therfore used by the program. Because as far as I'm aware I'd have to create a Writer for each file in the transact method and pass it to each of the 3 methods everytime.

In the most cases it's the same, but...
Your answer should change with different critereria, i.e. max String length is limited by ram, so in very very lenght cases it is better to write line by line.
But everytime you write (and flush), you are accessing the disk. In remote file locations, if the connection is very unreliable or you have latency, you can build an entire String and write once.
It depends by what is your situation, in general try to make your code CLEAR.

Jmock - how to automate & mock out console user input?

I have some functionality that I want to mock out being called from main (static: I've read about that too - jmock mocking a static method). i recently read that JMock doesn't support the mocking of static functions. Well, the associated code (that's giving me a problem) must be called from main, and must be in the class with main...
Sample source
Test code
Right now, I want to ensure that my main has a test to make sure that the file exists before it proceeds. Problem is, I have my program getting user input from the console, so I don't know how to mock that out? Do I just go down to that level of granularity, specifying at every point along the way what happens, so that I can write about only one operation in a function that returns the user's input? I know that to write the tests well, when the tests are run, they should not ask for the user input, I should be specifying it in my tests somehow.
I think it has to do with the following:
How to use JMock to test mocked methods inside a mocked method
I'm not that good with JMock...

If the readInput() method does something, like, say:
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
return in.readLine();
Then you might be able to get away with a test that goes something like:
InputStream oldSystemIn = System.in;
InputStream mockSystemIn = context.mock(InputStream.class);
System.setIn(mockSystemIn);
context.checking(new Expectations() {{
// mock expected method calls and return values
}});
// execute
// verify
System.setIn(oldSystemIn);

You can use System Rules instead of mocking System.out and System.in.
public void MyTest {
#Rule
public TextFromStandardInputStream systemInMock = emptyStandardInputStream();
#Test
public void readTextFromStandardInputStream() {
systemInMock.provideText("your file name");
//your code that reads "your file name" from System.in
}
}

Stefan Birkner's answer gave me the direction that I need to be able to solve this. I have posted the code that I used to solve this below.
Solved tests: Birkner's version (recommended)
Solved tests: piped version
Changed source:
WHY: What happens is, with Birkner's library, you can only ever read as much input as you instantiate with the rule originally. If you want to iteratively write to the endpoint, you can do this with a pipe hack, but it doesn't make much of a difference, you can't write to the input over the pipe while the function is actually running, so you might as well use Birkner's version, his #Rule is more concise.
Explanation: In both the pipe hack and with Birkner's code, in the client being tested, multiple calls to create any object that reads from System.in will cause a blocking problem where, once the first object has opened a connection to the Pipe or to System.in, others can not. I don't know why this exactly is for Birkner's code, but with the Pipe I think that it's because you can only open 1 stream to the object-ever. Notice that if you call close on the first buffered reader, and then try to reopen System.in in your client code after having called it from the test, then the second attempt to open will fail because the pipe on the writer's side has been closed as well.
Solution: Easy way to solve this, and probably not the best because it requires modifying the source of the actual project, but not in a horrendous way (yet). So instead of having in the source of the actual project multiple BufferedReader creations, create a buffered reader, and pass the same reader reference around or make it a private variable of the class. Remember that if you have to declare it static that you should not initialize it in a static context because if you do, when the tests run, System.setIn will get called AFTER the reader has been initialized in your client. So it will poll on all readLine/whatever calls, just as it will if you try to create multiple objects from System.in.
Notice that to have your reads segregated between calls from your reader, in this case BufferedReader, you can use newlines to segregate them in the original setup. This way, it returns what you want in each call in the client being tested.

Using Apache Velocity with StringBuilders/CharSequences

We are using Apache Velocity for dynamic templates. At the moment Velocity has following methods for evaluation/replacing:
public static boolean evaluate(Context context, Writer writer, String logTag, Reader reader)
public static boolean evaluate(Context context, Writer out, String logTag, String instring)
We use these methods by providing StringWriter to write evaluation results. Our incoming data is coming in StringBuilder format so we use StringBuilder.toString and feed it as instring.
The problem is that our templates are fairly large (can be megabytes, tens of Ms on rare cases), replacements occur very frequently and each replacement operation triples the amount of required memory (incoming data + StringBuilder.toString() which creates a new copy + outgoing data).
I was wondering if there is a way to improve this. E.g. if I could find a way to provide a Reader and Writer on top of same StringBuilder instance that only uses extra memory for in/out differences, would that be a good approach? Has anybody done anything similar and could share any source for such a class? Or maybe there any better solutions to given problem?

Velocity needs to parse the whole template before it can be evaluated. You won't be able to provide a Reader and Writer to gain anything in a single evaluation. You could however break up your templates into smaller parts to evaluate them individually. That's going to depend on what's in them and if the parts would depend on each other. And the overhead might not be worth it, depending on your situation.
If you're only dealing with variable substitution in your templates you could simply evaluate each line of your input. Ideally you can intercept that before it goes into the StringBuilder. Otherwise you're still going to have to incur the cost of that memory plus its toString() that you'd feed into a BufferedReader to make readLine() calls against.
If there are #set directives you'll need to keep passing the same context for evaluation. If there are any #if or #foreach blocks it's going to get tricky. I have actually done this before and read in enough lines to capture the block of input for Velocity to parse and evaluate. At that point however you're starting to do Velocity's job and it's probably not worth it.

You can save one copy of the string by reading the value field from the StringBuilder through reflection and creating a CharArrayReader on that:
StringBuilder sb = new StringBuilder("bla");
Field valueField = StringBuilder.class.getSuperclass().getDeclaredField("value");
valueField.setAccessible(true);
char[] value = (char[]) valueField.get(sb);
Reader r = new CharArrayReader(value, 0, sb.length());

Yikes. That's a pretty heavyweight use for evaluate(). I assume you have good reasons for not using the standard resource loader stuff, so i won't pontificate. :)
I haven't heard of any solution that would fit this, but since Reader is not a particularly complicated class, my instinct would be to just create your own StringBufferReader class and pass that in.

Java 1.4 singleton containing a mutable field

I'm working on a legacy Java 1.4 project, and I have a factory that instantiates a csv file parser as a singleton.
In my csv file parser, however, I have a HashSet that will store objects created from each line of my CSV file. All that will be used by a web application, and users will be uploading CSV files, possibly concurrently.
Now my question is : what is the best way to prevent my list of objects to be modified by 2 users ?
So far, I'm doing the following :
final class MyParser {
private File csvFile = null;
private Set myObjects = Collections.synchronizedSet(new HashSet);
public synchronized void setFile(File file) {
this.csvFile = file;
}
public void parse()
FileReader fr = null;
try {
fr = new FileReader(csvFile);
synchronized(myObjects) {
myObjects.clear();
while(...) { // foreach line of my CSV, create a "MyObject"
myObjects.add(new MyObject(...));
}
}
} catch (Exception e) {
//...
}
}
}
Should I leave the lock only on the myObjects Set, or should I declare the whole parse() method as synchronized ?
Also, how should I synchronize - both - the setting of the csvFile and the parsing ? I feel like my actual design is broken because threads could modify the csv file several times while a possibly long parse process is running.
I hope I'm being clear enough, because myself am a bit confused on those multi-synchronization issues.
Thanks ;-)

Basically you are assuming methods need to setFile first and then call parser. Let us consider this,
t1 (with setFile XX) and t2 (with setFile YY) are coming at the same time and t2 set the file to be YY. Then t1 asks for parse() and starts getting records from YY. No amount of synchronised is going to solve this for you and the only way out is to have the parse method take a File parameter or remove the singleton constraint (so that each thread has its own file object). So use a
public void parse(File file) //and add synchronised if you want.

I think there are multiple issues which are there in this code.
If this class is a singleton, this class should be stateless i.e no state should be present in this class. therefore having setter for the file itself is not the right thing to do. Pass the file object into parse method and let it work on the argument. This should fix your issue of synchronizing across various methods
Though your myObjects Set is private, I am assuming you are not passing this to any other calling classes. In case you are, always return clone of this set to avoid callers making changes to original set.
Synchronized on the object is good enough if all your set changes are within the synchronized block.

Use separate MyParser object for every parse request and you will not have to deal with concurrency (at least not in MyParser). Also, then will you be able to truly service multiple users at a time, not forcing them to wait or erasing the results of previous parsing jobs.

The singleton thing is mostly a red herring. Nothing to do with concurrency issues you are considering. As far as synchronisation goes, I think you are ok. Making the method synchronized will also work despite the fact that myObjects is static because it is a singleton.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.