I'm working on a legacy Java 1.4 project, and I have a factory that instantiates a csv file parser as a singleton.
In my csv file parser, however, I have a HashSet that will store objects created from each line of my CSV file. All that will be used by a web application, and users will be uploading CSV files, possibly concurrently.
Now my question is : what is the best way to prevent my list of objects to be modified by 2 users ?
So far, I'm doing the following :
final class MyParser {
private File csvFile = null;
private Set myObjects = Collections.synchronizedSet(new HashSet);
public synchronized void setFile(File file) {
this.csvFile = file;
}
public void parse()
FileReader fr = null;
try {
fr = new FileReader(csvFile);
synchronized(myObjects) {
myObjects.clear();
while(...) { // foreach line of my CSV, create a "MyObject"
myObjects.add(new MyObject(...));
}
}
} catch (Exception e) {
//...
}
}
}
Should I leave the lock only on the myObjects Set, or should I declare the whole parse() method as synchronized ?
Also, how should I synchronize - both - the setting of the csvFile and the parsing ? I feel like my actual design is broken because threads could modify the csv file several times while a possibly long parse process is running.
I hope I'm being clear enough, because myself am a bit confused on those multi-synchronization issues.
Thanks ;-)
Basically you are assuming methods need to setFile first and then call parser. Let us consider this,
t1 (with setFile XX) and t2 (with setFile YY) are coming at the same time and t2 set the file to be YY. Then t1 asks for parse() and starts getting records from YY. No amount of synchronised is going to solve this for you and the only way out is to have the parse method take a File parameter or remove the singleton constraint (so that each thread has its own file object). So use a
public void parse(File file) //and add synchronised if you want.
I think there are multiple issues which are there in this code.
If this class is a singleton, this class should be stateless i.e no state should be present in this class. therefore having setter for the file itself is not the right thing to do. Pass the file object into parse method and let it work on the argument. This should fix your issue of synchronizing across various methods
Though your myObjects Set is private, I am assuming you are not passing this to any other calling classes. In case you are, always return clone of this set to avoid callers making changes to original set.
Synchronized on the object is good enough if all your set changes are within the synchronized block.
Use separate MyParser object for every parse request and you will not have to deal with concurrency (at least not in MyParser). Also, then will you be able to truly service multiple users at a time, not forcing them to wait or erasing the results of previous parsing jobs.
The singleton thing is mostly a red herring. Nothing to do with concurrency issues you are considering. As far as synchronisation goes, I think you are ok. Making the method synchronized will also work despite the fact that myObjects is static because it is a singleton.
Related
I am writing RaspberryPi program for executing tasks at given time. I wrote TaskManager that keeps all tasks in synchronized Map (awaitingTasks) and manage them. One of it's methods is
addInTimeTasks(...)
public static int addInTimeTask(Callable task, DateTime time) {
synchronized (awaitingTasks) {
final int id = assignNewId();
awaitingTasks.put(id, scheduledThreadPool.schedule(new CallableTask(task, new CallableTask.MyCallback() {
#Override
public void onFinish() {
awaitingTasks.remove(id);
}
}), TimeDiffCalc.secToDate(time), TimeUnit.SECONDS));
return id;
}
}
as you can see Task (thinking of making it class if it has more attributes) have its own ID, Date and method that it executes.
I want to handle situation when server restarts and all in time tasks simply dissapear. I was thinking about holding Tasks in database. I can hold TaskID and Date but how do I determine method that given task should execute?
I like flexablity of this method cuz I can make any method in-time executable.
For example here is method from RGBLed class (which have mutltiple methods that can be executed in time)
public int lightLed(final LedColor color, DateTime dateTime){
return TaskManager.addInTimeTask(new Callable<Void>() {
public Void call() throws Exception {
//here is code that makes led lighting
return null;
}
},dateTime);
}
What came into my mind was to assign to every method ID, and then get method by id but I dont think it is passible.
I ll bet that were many questions with similar problem but I can not simply find them. I can not specify question properly (So please change it)
Thanks!
You are facing two problems. That one that you describe can be fixed "easily". You see, you know that you want to call specific methods.
Methods have names. Names are ... strings. So, you could simply store that name as string; and when you have some object in front of you, you can use Java reflection means to invoke a particular method.
The other problem is: persisting your objects might not be that easy. If I get your examples right, you are mostly dealing with anonymous inner classes. And yes, objects of such classes can be serialized too, but not as "easy" or "without thought" as normal classes (see here for example).
So, my suggestions:
Don't use inner classes; but ordinary classes (although that might affect the "layout" of existing code to a great degree); serialize objects of those classes
Together with serialized object, remember name (and probably the arguments you need) so you can call methods by name
Probably it would make sense to create a class specifically for that purpose; containing two fields (the actual object to serialized, and the name of the method to call on that).
Assuming we have a method which calls another.
public readXMLFile() {
// reading each line and parsing each line to return node.
Node = parse(line);
}
private parse() {
}
Now is it a good practice to use a more comprehensive function name like "readXMLFileAndParse" ?
Pro's:
It provides a more comprehensive information to caller of what the function is supposed to be doing.
Else client may wonder if it only reads where is the "parse" utility.
In other words I see a clear advantage of a function name to be comprehensive of all the activities nested within it. Is this right thing to do aka is this considered a good practice ?
It's a guideline that every method can only have one job (single responsibility).
However this will cause problems for the naming where a method will return a result of a combination of sub-methods.
Therefore you should name it to describe its primary function: parsing a file. Reading a file is part of that, but it's not vital to the end-user since it's implicated.
Then again, you have to think of what this exactly entails: nobody just parses a file just to parse it. Do you retrieve data? Do you write data?
You should describe your actions on that file, but not as literally as 'readfile' or 'parsefile'.
RetrieveCustomers if you're reading customers would be a lot more descriptive.
public List<Customer> RetrieveCustomers() {
// loop over lines
// call parser
}
private Customer ParseCustomer() { }
If you'd share what exactly it is you're trying to parse, that would help a lot.
I think it depends on the complexity of your class. Since the method is private, no-one, in theory, should care. Named it descriptively enough so you can read your own code 6 months from now, and stop there.
public methods, on the other hand, should be well-named and well-documented. Extra descriptiveness there can't hurt.
My question is very simple, When use IndexReader.openIfChanged (reader) replace the previous reader, How to safely the close oldReader?
Here is the code: (Use Lucene 3.5)
IndexReader newReader=IndexReader.openIfChanged(reader);
if(newReader!=null){
IndexReader oldReader=reader;
IndexSearcher oldSearcher=searcher;
reader=newReader;
searcher=new IndexSearcher(newReader);
oldSearcher.close();
oldReader.close();//or oldReader.decRef(),result is the same
}
This code in a deamon thread,Every 5 seconds run time
IndexReader instance(reader object) is globally unique
Since this change, I get an exception:
org.apache.lucene.store.AlreadyClosedException: this IndexReader is closed
at org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:297)
at org.apache.lucene.index.IndexReader.getSequentialSubReaders(IndexReader.java:1622)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:98)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298)
at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:577)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:517)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:487)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:400)
at org.zenofo.index.IndexManager.query(IndexManager.java:392)
...
IndexManager.java:392 using the reader object(IndexReader instance,globally unique)
IndexManager.query method has a large number of concurrent requests, all requests to use a globally unique IndexReader instance (reader object)
I need to close oldReader just because:
Too many open files in Lucene Indexing when number of users
increase
Lucene Wiki:Too many open files
Reference:
API-IndexReader
API-IndexSearcher
How do I solve this problem?
Look at NRTManager and SearcherManager. You really don't have to handle this yourself.
You need to impose the happens-before relationship between writes to the public static vars and subsequent reads of them from other threads. If you use more than one var, you'll have the issue of atomicity so I recommend you use only one var since that is all you need.
Simply, this would work for you:
public class SearcherManager
{
public static volatile IndexSearcher searcher;
private static void reopen() {
// your code, just without assignment to reader
}
}
The key is the volatile modifier. Be sure to fully initialize everything before writing to the var, but do the closing of old objects after the write—in other words, just make sure you go on doing it the way you are doing it now :)
But, as #MJB notes in his answer, you should really not be doing this since it is all built into Lucene. Check out the Javadoc on NRTManagerReopenThread to get all the info you need, including a full code sample.
I assume the searcher (that later refereed as oldSearcher) if working on reader (oldReader), in that case when you close it it close also the reader it use, so you don't need to close it,
oldSearcher.close() is enough.
I don't see at all what oldReader and oldSearcher are doing at all!!!!!
Can't you just remove them along with their close()
If you still need them, then my bet is that the oldSearcher somehow is related to oldReader, so calling close() on oldSearcher also causes closing oldReader that's why you get the exception
Is that the whole bulk of code, or did u simplify it? if yes to to first, then just remove oldReader and oldSearcher altogether
Cheers
Take a look at the indexreader reference counting methods.
i.e Increase the reference count when you are instantiating a new IndexSearcher with
reader.incRef(); and decrease it when you're done with the search results and preferably in a finally statement of a try catch method with reader.decRef();
reader.decRef() automatically closes the reader when the number of references is 0.
I wrote a java class which reads a file and stores each line in an arraylist. I want to access this arraylist large number of times. Everytime the class is called to access the arraylist, it reads the file again. I want the file to be read once and then access the arraylist multiple times. How can I do this?
Store it in a field of the class. I.e.:
public class Foo {
private List<String> list;
public List<String> readData() {
if (list != null) {
return list;
}
// do the reading.
}
}
Note that if this is used in a multithreaded environment you'd have to take extra measures. For example put synchronized on the method.
As Peter noted, if you can read multiple files, then you can use a Map<String, List<String>>
Another note is that you should use only one instance of this class. If you create multiple instances you won't have the desired effect.
It sounds like you should be reading the file on construction of the class rather than when accessing it. That doesn't necessarily mean in the constructor, mind you - you may well want to have a static factory method that reads the files into an ArrayList, and then passes that list to the real constructor. This would make the class easier to test (and use in other tests).
Then you only need to create the class once, and make the rest of your code use the same instance. Note that this doesn't require use of the singleton pattern, which would itself make testing harder. It just means propagating the instance to all the code that needs it.
Maybe you need to make a singleton? Then you will read the file only once - when you create a really new instance of class.
If its a web application, maybe you would consider storing it in the ServletContext or in the user HttpSession depending on how much does the file changes
Looking through some java code and this just does not seem right. To me, it looks like every time you call projects, you will get a new hashmap, so that this statement is always false
projects.get(soapFileName) != null
Seems like it should have a backing field
public static HashMap<String,WsdlProject> projects = new HashMap<String,WsdlProject>();
public Object[] argumentsFromCallSoapui(CallT call, Vector<String> soapuiFiles, HashMap theDPLs,int messageSize)
{
try {
for (String soapFileName:soapuiFiles){
System.out.println("Trying "+soapFileName);
WsdlProject project ;
if (projects.get(soapFileName) != null){
project = projects.get(soapFileName);
} else {
project = new WsdlProject(soapFileName);
projects.put(soapFileName,project);
}
}
} ...
}
Nope. In Java that static variable only gets initialized once.
So, this line will only get called once.
public static HashMap<String,WsdlProject> projects = new HashMap<String,WsdlProject> ();
The projects variable will be initialized once, when the class first loads.
Generally, static maps of this sort are a bad idea: they often turn into memory leaks, as you hold entries long past their useful life.
In this particular case, I'd also worry about thread safety. If you have multiple threads calling this method (which is likely in code dealing with web services), you'll need to synchronize access to the map or you could corrupt it.
And, in a general stylistic note, it's a good idea to define variables using the least restrictive class: in this case, the interface Map, rather than the concrete class HashMap.
You don't call projects - it's a field, not a method.
As it's a static field, it will be initialized exactly once (modulo the same type being loaded in multiple classloaders).
if you add a static initialiser (static constructor?) you'll be able to see that statics are just initialised the first time the class is loaded:
public class Hello {
static { System.out.println("Hello static World!"); }
...
}
You won't get a new HashMap every time you invoke a method on projects, if that's what you are referring to. A new HashMap will be created once, however all instances of the class will share a single HashMap.