How to access a shared file from multiple threads most effectively?

How to access a shared file from multiple threads most effectively? - java

I'm developing a small web-app whose servlets periodically get access to a shared resource which is a simple text-file on the server side holding some lines of mutable data. Most of the time, servelts just read file for the data, but some servelts may also update it, adding new lines to the file or removing and replacing existing lines. Although file contents is not updated very often, there is still little chance for the data inconsistency and file corruption if two or more servlets decide to read and write to file at the same time.
The first goal is to make the file reading/writing safe. For this purpose, I've created a helper FileReaderWriter class providing some static methods for thread-safe file access. The read and write methods are coordinated by ReentrantReadWiteLock. The rule is quite simple: multiple threads may read from file at any time as far as no other thread is writing to it at the same time.
public class FileReaderWriter {
private static final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
public static List<String> read(Path path) {
List<String> list = new ArrayList<>();
rwLock.readLock().lock();
try {
list = Files.readAllLines(path);
} catch (IOException e) {
e.printStackTrace();
} finally {
rwLock.readLock().unlock();
}
return list;
}
public static void write(Path path, List<String> list) {
rwLock.writeLock().lock();
try {
Files.write(path, list);
} catch (IOException e) {
e.printStackTrace();
} finally {
rwLock.writeLock().unlock();
}
}
}
Then, every servelt may use the above method for file reading like this:
String dataDir = getServletContext().getInitParameter("data-directory");
Path filePath = Paths.get(dataDir, "test.txt");
ArrayList<String> list = FileReaderWriter.read(filePath);
Similarly, writing may be done with FileReaderWriter.write(filePath, list) method. Note: if some data needs to be replaced or removed (which means fetching the data form a file, processing it and writing updated data back to a file), then the whole code paths for this operation should be locked by rwLock.writeLock() for atomicity reasons.
Now, when access to a shared file seems to be safe (at least, I hope so), the next step is to make it fast. From the scalability perspective, reading a file at every user's request to the servlet doesn't sound reasonable. So, what I thought of is to read the contents of file into ArrayList (or other collection) only once during the context initialization time and then share this ArrayList (not the file) as a context-scoped data-holder attribute. Then a context-scoped attribute can be shared by servlets with the same locking mechanism as described above and the contents of the updated ArrayList may be independently stored back to the file on some regular basis.
Another solution (in order to avoid locking) would be to use CopyOnWriteArrayList (or some other collection from java.util.concurrent package) for holding a shared data and designate a single-threaded ExecutorService to dump its contents into a file when needed. I also heard of Java Memory-Mapped Files for mapping the entire file into internal memory, but not sure if such approach is appropriate for this particular situation.
So, could anybody, please, guide me thorough the most effective ways (maybe, suggesting some other alternatives) to solve the problem with a shared file access, provided that the writing to a file is quite infrequent and the contents of it is not expected to exceed a dozens of lines.

You don't explain your real problem, only your current attempt then, is difficult to provide a good solution.
Your approach has two serious problems:
Problem 1: concurrency
a shared resource which is a simple text-file on the server side
holding some lines of mutable data
90% of the solution to a problem is a good data structure. A mutable file it's not. Even popular database engines have important concurrency limitations (eg. SQLite), don't try to reinvent the wheel.
Problem 2: horizontal scalability
Even if he solves his local concurrency problems (eg. synchronous methods), you won't be able to deploy multiple instances (nodes/servers) of your application.
Solution 1: use the right tool for the job
You don't explain exactly the nature of your (data management) problem but probably any NoSQL database will do you good (reading about MongoDB can be a good starting point).
(Bad) solution 2: use FileLock
If for some reason you insist on doing what you indicate, use low level file locks using FileLock. You will only have to deal with partial file locks and even these can be distributed horizontally. You won't have to worry about synchronizing other resources either, as file-level locks will suffice.
(Limited) solution 3: in memory structure
If you don't need horizontal scalability, you can use a shared in memory structure like ConcurrentHashMap but you will lose the horizontal scalability and you could lose transactions if you do not persist the information before an application stop.
Conclusion
Although there are more exotic distributed data models, using a database for even a single table may be the best and simplest solution.

Related

What is the right way to close and/or delete a memory mapped file?

From what I've read, it's a bit tricky closing a memory mapped file in Java.
By default, they're closed only by a mechanism akin to (but more efficient than) finalization.
I already know it's possible to close them explicitly in an implementation-specific way (but one common to both OpenJDK and the Oracle JDK) by using reflection, using the following code (inspired by this answer to a related question):
try {
Method cleanerMethod = buffer.getClass().getMethod("cleaner");
cleanerMethod.setAccessible(true);
Object cleaner = cleanerMethod.invoke(buffer);
Method cleanMethod = cleaner.getClass().getMethod("clean");
cleanMethod.setAccessible(true);
cleanMethod.invoke(cleaner);
} catch(Exception ex) { /* log exception */ }
I gather from the discussion in that question that it's not reliably possible to delete the backing file for a MappedByteBuffer without closing the buffer in this manner.
However, there are also other, related resources that must be closed: the RandomAccessFile and the FileChannel that were used to create the MappedByteBuffer.
Does the order in which these resources are closed matter? Are there any differences between the order in which they must be closed on Mac/Windows/Linux?
Ultimately, what I want to know how to do safely comes down to these two questions:
What is the correct way to close a MappedByteBuffer (and related resources) and ensure the backing file is saved?
Is there a way to close a MappedByteBuffer (and related resources) without accidentally causing it to write uncommitted changes to the disk, when the goal is to quickly delete the backing file?

Java In-Application Storage from Xml Data

Hi I am developing a game in Java for learning purposes and have a question regarding in-app handling of data
Say I have a bunch of xml files storing a variety of statistics for, weapons armours etc and I parse these files into my application as lists;
Note, this data will never change and is effectively "final", it is constant data that will be used by the game.
Now my question is, would you store this data (for use while the game is running) in a class (in my example here a singleton) within your application that you can access easily? So for example something like this (where the list has been read elsewhere by the XML parser)
public class WeaponData {
private List<Weapon> weaponData;
public static final WeaponData instance = new WeaponData(parseXml("weapons"));
private WeaponData(List<Weapon> data) {
weaponData = data;
}
public static WeaponData getInstance() {
return instance;
}
public List<Weapon> getWeaponData() {
return weaponData;
}
}
And allows me to use the data with
WeaponData.getInstance().getWeaponData();
If so, is this the way to go about it, or is there a better way?
Or, the only alternative I can think of, is to keep reading the XML whenever its needed (which seems unwise, paticularly given, this data retrieval may (for a commercial application atleast) be a network operation, even if in my case it is just a hdd read). Plus if anything else it would be repeated code.
Thanks for your time!

You should probably read it in once and save it, but don't store it as a singleton. Read the WeaponData during initialization and store an instance of it. If you don't like that you are welcome to use the pattern you suggested, it just might be awkward later.
If you ever get to where WeaponData might be updated while the app is running, you may want to re-read it, but not ever time.

I'm thinking that your approach will work, especially as you have mentioned that the data won't become large enough that it imposes on memory. Another concern would be the frequency in which you must read the data. There's a tradeoff between those two considerations, but it sounds like the small XML size warrants unmarshalling the XML into objects immediately.
Just for completeness, you should be using JAX-B to generate Java classes from your XML schema, rather than rolling your own parser.
If you fear that the XML's data may change, you might consider using the WatchService API to detect changes and re-parse the XML file.

How to keep internal/hidden Database connection open when calling JAR?

I have to use already developed JAR in order to keep my code running. The JAR provides me with functionality of transforming file formats . My code looks something like this:
public class Transformer {
//some fields
//constructor
public List<MyFile> tranformFiles(List<MyFile> files){
JarClassUsed used = new JarClassUsed();
List<MyFile> data = new ArrayList<>();
foreach(MyFile file : files){
data.add(used.TransformFileFormat(file));
}
return data;
}
}
It's working fine, but still very slow. The problem is that the JAR I have to use is making a hidden/internal DB connection call when transforming files (reading predefined configurations). When count is around 100 is acceptable, but I have one case with more than 1000. And as already you guessed is quite problematic.
How can I keep this hidden connection Opened - still DB connections are expensive to create and Singleton or static sound like good solution. It's not smart to create it for each file that is being processed. Can I use reflection somehow here?

So we talked about it in the office... it seems the best course of action is to find a library that does the same formats, if thats not possible externalize the configuration and utilize a different library.
I know this does not answer your question in any way... and you probably already knew that... but unless you have the raw source and are allowed to change that jar, the "hidden" connection is something written into the library.
I really wish I had better advice, but I think the best course of action is to investigate other implementations.
If you can list the formats I would be more than happy to assist in this investigation in a different medium.

Storing state in Java

Broad discussion question.
Are there any libraries already which allow me to store the state of execution of my application in Java?
E.g I have an application which processes files, now the application may be forced to shutdown suddenly at some point.I want to store the information on what all files have been processed and what all have not been, and what stage the processing was on for the ongoing processes.
Are there already any libraries which abstract this functionality or I would have to implement it from scratch?

It seems like what you are looking for is serialization which can be performed with the Java Serialization API.
You can write even less code if you decide to use known libraries such as Apache Commons Lang, and its SerializationUtils class which itself is built on top the Java Serialization API.
Using the latest, serializing/deserializing your application state into a file is done in a few lines.
The only thing you have to do is create a class holding your application state, let's call it... ApplicationState :-) It can look like that:
class ApplicationState {
enum ProcessState {
READ_DONE,
PROCESSING_STARTED,
PROCESSING_ENDED,
ANOTHER_STATE;
}
private List<String> filesDone, filesToDo;
private String currentlyProcessingFile;
private ProcessState currentProcessState;
}
With such a structure, and using SerializationUtils, serializing is done the following way:
try {
ApplicationState state = new ApplicationState();
...
// File to serialize object to
String fileName = "applicationState.ser";
// New file output stream for the file
FileOutputStream fos = new FileOutputStream(fileName);
// Serialize String
SerializationUtils.serialize(state, fos);
fos.close();
// Open FileInputStream to the file
FileInputStream fis = new FileInputStream(fileName);
// Deserialize and cast into String
String ser = (String) SerializationUtils.deserialize(fis);
System.out.println(ser);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}

It sounds like the Java Preferences API might be a good option for you. This can store user/system settings with minimal effort on your part and you can update/retrieve at any time.
https://docs.oracle.com/javase/8/docs/technotes/guides/preferences/index.html

It's pretty simple to make from scratch. You could follow this:
Have a DB (or just a file) that stores the information of processing progress. Something like:
Id|fileName|status|metadata
As soon as you start processing a file make a entry to this table. Ans mark status as PROCESSING, the you can store intermediate states, and finally when you're done you can set status to DONE. This way, on restart, you would know what are the files processed; what are the files that were in-citu when the process shutdown/crashed. And (obviously) where to start.
In large enterprise environment where applications are loosely coupled (and there is no guarantee if the application will be available or might crash), we use Message Queue to do something like the same to ensure reliable architecture.

There are almost too many ways to mention. I would choice the option you believe is simplest.
You can use;
a file to record what is done (and what is to be done)
a persistent queue on JMS (which support multiple processes, even on different machine)
a embedded or remote database.
An approach I rave about is using memory mapped files. A nice feature is that information is not lost if the application dies or is killed (provided the OS doesn't crash) which means you don't have to flush it, nor worry about losing data if you don't.
This works because the data is partly managed by the OS which means it uses little heap (even for TB of data) and the OS deals with loading and flushing to disk making it much faster (and making sizes much larger than your main memory practical).
BTW: This approach works even with a kill -9 as the OS flushes the data to disk. To test this I use Unsafe.getByte(0) which crashes the application with a SEG fault immediately after making a change (as in the next machine code instruction) and it still writes the change to disk.
This won't work if you pull the power, but you have to be really quick. You can use memory mapped files to force the data to disk before continuing, but I don't know how you can test this really works. ;)
I have a library which could make memory mapped files easier to use
https://github.com/peter-lawrey/Java-Chronicle
Its a not long read and you can use it as an example.

Apache Commons Configuration API: http://commons.apache.org/proper/commons-configuration/userguide/howto_filebased.html#File-based_Configurations

Retrieving Large Lists of Objects Using Java EE

Is there a generally-accepted way to return a large list of objects using Java EE?
For example, if you had a database ResultSet that had millions of objects how would you return those objects to a (remote) client application?
Another example -- that is closer to what I'm actually doing -- would be to aggregate data from hundreds of sources, normalize it, and incrementally transfer it to a client system as a single "list".
Since all the data cannot fit in memory, I was thinking that a combination of a stateful SessionBean and some sort of custom Iterator that called back to the server would do the trick.
So, in other words, if I have an API like Iterator<Data> getData() then what's a good way to implement getData() and Iterator<Data>?
How have you successfully solved this problem in the past?

Definitely don't duplicate the entire DB into Java's memory. This makes no sense and only makes things unnecessarily slow and memory-hogging. Rather introduce pagination at database level. You should query only the data you actually need to display on the current page, like as Google does.
If you actually have a hard time in implementing this properly and/or figuring the SQL query for the specific database, then have a look at this answer. For JPA/Hibernate equivalent, have a look at this answer.
Update as per the comments (which actually changes the entire question subject...), here's a basic (pseudo) kickoff example:
List<Source> inputSources = createItSomehow();
Source outputSource = createItSomehow();
for (Source inputSource : inputSources) {
while (inputSource.next()) {
outputSource.write(inputSource.read());
}
}
This way you effectively end up with a single entry in Java's memory instead of the entire collection as in the following (inefficient) example:
List<Source> inputSources = createItSomehow();
List<Entry> entries = new ArrayList<Entry>();
for (Source inputSource : inputSources) {
while (inputSource.next()) {
entries.add(inputSource.read());
}
}
Source outputSource = createItSomehow();
for (Entry entry : entries) {
outputSource.write(entry);
}

Pagination is a good solution when working with a web based ui. sometimes, however, it is much more efficient to stream everything in one call. the rmiio library was written explicitly for this purpose, and is already known to work in a variety of app servers.

If your list is huge, you must assume that it can't fit in memory. Or at least that if your server need to handle that on many concurrent access then you have high risk of OutOfMemoryException.
So basically, what you do is paging and using batch reading. let say you load 1 thousand objects from your database, you send them to the client request response. And you loop until you have processed all objects. (See response from BalusC)
Problem is same on client side, and you'll likely to need to stream the data to the file system to prevent OutOfMemory errors.
Please also note : It is okay to load millions of object from a database as an administrative task : like for performing a backup, and export of some 'exceptional' case. But you should not use it as a request any user could do. It will be slow and drain server resources.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.