Using WatchServiceDirectoryScanner in Spring

Using WatchServiceDirectoryScanner in Spring - java

I have a requirement of implementing a Watch Service on a folder. This is straight forward approach of using Java7's watch service. I have successfully done it, I am able to capture events whenever a file is created/updated/deleted on the folder where I have been watching. The problem here is it is not applicable for contents of sub folders and it is clearly written in the documentation. My requirement is to watch over contents of sub folder as well. This is not possible using the above approach unless I write a loop over all the sub folders manually and listen to each and every folder, this I think leads to some memory leak if not programmed well. Hence I am going with what spring suggested in the newer release explained here This is very clear approach which I have seen for WatchService. The problem here is this will listen to only ENTRY_CREATE events i.e., only the events where we have created the file and this can be at any level. This is not working when I change the file or delete the file. How should we go ahead in this case.
public static void watchFolderTree(String pathStr)
throws Exception
{
long waitTime = 10000;
WatchServiceDirectoryScanner scanner = new WatchServiceDirectoryScanner(pathStr);
scanner.start();
List<File> changedFiles = null;
while(true)
{
changedFiles = scanner.listFiles(new File(pathStr));
if(changedFiles.size() > 0)
{
System.out.println("There is a file ");
}
Thread.sleep(waitTime);
}
}
References :
Monitor subfolders with a Java watch service
JAVA 7 watch service

Related

Move already process file from one folder to another folder in flink

I am a new bee to flink and facing some challenges to solve the below use case
Use Case description:
I will receive a csv file with a timestamp on every single day in some folder say input. The file format would be file_name_dd-mm-yy-hh-mm-ss.csv.
Now my flink pipeline will read this csv file in a row by row fashion and it will be written to my Kafka topic.
Immediately after completion of data reading this file needs to be moved to another folder historic folder.
Why i need this is because : suppose that your ververica server stops either abruptly or manually and if you have all the processed files lying at the same location then after the ververica restart flink will re read all the files that it had processed earlier. So to prevent this scenario those files needs to be immediately move already read files to another location.
I googled a lot but did not find anything so can you guide me to achieve this.
Let me know if anything else is required.

Out of the box Flink provides the facility to monitor directory for new files and read them - via StreamExecutionEnvironment.getExecutionEnvironment.readFile (see similar stack overflow threads for examples - How to read newly added file in a directory in Flink / Monitoring directory for new files with Flink for data streams , etc.)
Looking into the source code of the readFile function, it calls for createFileInput() method, which simply instantiates ContinuousFileMonitoringFunction, ContinuousFileReaderOperatorFactory and configures the source -
addSource(monitoringFunction, sourceName, null, boundedness)
.transform("Split Reader: " + sourceName, typeInfo, factory);
ContinuousFileMonitoringFunction is actually a place where most of the logic happens.
So, if I were to implement your requirement, I would extend the functionality of ContinuousFileMonitoringFunction with my own logic of moving the processed file into the history folder and constructed the source from this function.
Given that the run method performs the read and forwarding inside the checkpointLock -
synchronized (checkpointLock) {
monitorDirAndForwardSplits(fileSystem, context);
}
I would say it's safe to move to historic folder on checkpoint completion files which have the modification day older then globalModificationTime, which is updated in monitorDirAndForwardSplits on splits collecting.
That said, I would extend the ContinuousFileMonitoringFunction class and implement the CheckpointListener interface, and in notifyCheckpointComplete would move the already processed files to historic folder:
public class ArchivingContinuousFileMonitoringFunction<OUT> extends ContinuousFileMonitoringFunction<OUT> implements CheckpointListener {
...
#Override
public void notifyCheckpointComplete(long checkpointId) throws Exception {
Map<Path, FileStatus> eligibleFiles = listEligibleForArchiveFiles(fs, new Path(path));
// do move logic
}
/**
* Returns the paths of the files already processed.
*
* #param fileSystem The filesystem where the monitored directory resides.
*/
private Map<Path, FileStatus> listEligibleForArchiveFiles(FileSystem fileSystem, Path path) {
final FileStatus[] statuses;
try {
statuses = fileSystem.listStatus(path);
} catch (IOException e) {
// we may run into an IOException if files are moved while listing their status
// delay the check for eligible files in this case
return Collections.emptyMap();
}
if (statuses == null) {
LOG.warn("Path does not exist: {}", path);
return Collections.emptyMap();
} else {
Map<Path, FileStatus> files = new HashMap<>();
// handle the new files
for (FileStatus status : statuses) {
if (!status.isDir()) {
Path filePath = status.getPath();
long modificationTime = status.getModificationTime();
if (shouldIgnore(filePath, modificationTime)) {
files.put(filePath, status);
}
} else if (format.getNestedFileEnumeration() && format.acceptFile(status)) {
files.putAll(listEligibleForArchiveFiles(fileSystem, status.getPath()));
}
}
return files;
}
}
}
and then define the data stream manually with the custom function:
ContinuousFileMonitoringFunction<OUT> monitoringFunction =
new ArchivingContinuousFileMonitoringFunction <>(
inputFormat, monitoringMode, getParallelism(), interval);
ContinuousFileReaderOperatorFactory<OUT, TimestampedFileInputSplit> factory = new ContinuousFileReaderOperatorFactory<>(inputFormat);
final Boundedness boundedness = Boundedness.CONTINUOUS_UNBOUNDED;
env.addSource(monitoringFunction, sourceName, null, boundedness)
.transform("Split Reader: " + sourceName, typeInfo, factory);

Flink itself does not provide a solution for doing this. You might need to build something yourself, or find a workflow tool that can be configured to handle this.
You can ask about this on the flink user mailing list. I know others have written scripts to do this; perhaps someone can share a solution.

How to have only one instance of the CHM file opened?

I want to set up only one instance of the CHM file when clicking on "Help" in the menubar and stopping it from opening twice when clicked again - therefore how do I code it?
I've tried to use it with process.isAlive(), but after I close it I want a counter set to zero, which only opens another CHM file when the counter is 0.
helpMenu.addMouseListener(new MouseAdapter() {
// do this after clicked
openCHM();
});
So MouseEvent is fired once.
openCHM() {
Process p;
if(cnt == 0) {
p = Runtime.getRuntime().exec("hh.exe Help.chm");
cnt++;
if(!p.isAlive()) {
cnt = 0;
}
}
I expected the counter to be 0, but then came to the conclusion that MouseEvent already fired once and the code got already executed, therefore it never goes to the second if-statement and sets my counter to 0.
EDIT
There is no correct answer how to open a CHM file once, but there is a workaround that makes it possible, we just need to look if the file is renamable or not.
protected void openCHM() {
try {
File file = new File("YOURNAME.chm");
boolean renamable = file.renameTo(file); // can't rename if file is already open, returns true if renaming is possible
if(renamable) {
Runtime.getRuntime().exec("hh.exe YOURNAME.chm");
} else if(!file.exists() ){
// message: file doesn't exist (in path)
} else {
// file is already open
}
} catch () {
}
}

I'm not a Java programmer but the short story - not possible (AFAIK).
You know, hh.exe is the HTMLHelp executable and associated with *.CHM files. It's just a shell that uses the HTML Help API and is really just hosting a browser window.
HH.EXE is not single instance, if you open a CHM or another file three times using HH.EXE, then three help windows will appear. Try it using PowerShell:
PS D:\_working> hh.exe C:\Windows\Help\htmlhelp.chm
Several client-side command line switches are available to help authors that are part of the HTML Help executable program (hh.exe) and therefore work when HTML Help Workshop is not set up.
KeyHH.exe was running years ago with special parameters.
If you call the HH API directly from your application, and not via a second helper program like HH.EXE or KEYHH.EXE, then you MUST close any open help windows before shutting down the application or you will probably crash Windows.
For some information related to the problem you maybe interested in Open CHM File Once.
Some quoted info from the link above:
When you do that you are just calling the help viewer again and again from the command line, you're not using the HTML Help API which is what you need to access the CHM once it is open. You need to check whether your flavors of Java and Smalltalk support calls to the HTML Help API. This API is documented in detail in the help file of Microsoft HTML Help Workshop, which is the compiler package you installed to be able to generate CHMs.

Java Application with Single Instance per User

currently I am struggling with the problem of a single instance JavaFX application, packed into an .exe using install4j. The application should run on a Windows terminal server and every user should only be able to run one instance of it. Meaning, Alice and Bob may use separate instances of the application but Alice may only have one instance open.
Writing a lock file with the process id is not a viable option, since the application is targed at Java 8, which has no consistent possibility to retrieve the process id. Opening a socket is also not a desirable solution, as there can be many instances on the same host. Moreover I suppose admins would not be that happy if some application randomly opened sockets on their server...
As I am using install4j to pack the application, I toggled the 'single instance only' feature which seems to run well when connected via a full RDP session. However, the application may be deployed using the RemoteApp feature which in some way circumvents install4j's checking mechanism, allowing one instance to be launched in a RDP session and another by using the RemoteApp.
This leads me to two questions:
How does the install4j check work? (I was not able to find any details...)
What would be the best solution to ensure a single instance per user at all times? (And also be failsafe, e.g. recover from JVM crashes)
Regarding the possibility of FileLock: as different operating system may handle file locks differently, can it be assured that the file lock is exclusively acquired by one JVM instance on the whole system?

Sockets will be a bit problematic if you want the application to run concurrently under different users.
The option of using an NIO FileLock is possible. You create the file under the user's directory so that another user can have his own lock file. The key thing to do here is to still try to acquire the file lock if the file exists already, by attempting to delete it before recreating it. This way if the application crashes and the file is still there, you will still be able to acquire a lock on it. Remember that the OS should release all locks, open file handles and system resources when a process terminates.
Something like this:
public ExclusiveApplicationLock
throws Exception {
private final File file;
private final FileChannel channel;
private final FileLock lock;
private ExclusiveApplicationLock() {
String homeDir = System.getProperty("user.home");
file = new File(homeDir + "/.myapp", app.lock");
if (file.exists()) {
file.delete();
}
channel = new RandomAccessFile(file, "rw").getChannel();
lock = channel.tryLock();
if (lock == null) {
channel.close();
throw new RuntimeException("Application already running.");
}
Runtime.getRuntime().addShutdownHook(new Thread(() -> releaseLock());
}
private void releaseLock() {
try {
if (lock != null) {
lock.release();
channel.close();
file.delete();
}
}
catch (Exception ex) {
throw new RuntimeException("Unable to release application process lock", ex);
}
}
}
Another alternative is to use a library that does this for you like Junique. I haven't tried it myself but you could have a go. It seems very old but I guess there isn't much that needs to change in something like this, nothing much changed in NIO since Java 1.4.
http://www.sauronsoftware.it/projects/junique/
It is on Maven Central though so you can import it easily.
https://mvnrepository.com/artifact/it.sauronsoftware/junique/1.0.4
If you look at the code you will see that it does the same thing with file locks:
https://github.com/poolborges/it.sauronsoftware.junique/blob/master/src/main/java/it/sauronsoftware/junique/JUnique.java

As for 1: On Windows, install4j launchers create a semaphore with the CreateSemaphore function in the Windows API. You can check the name of the semaphore by executing the launcher from the command line with the
/create-i4j-log
argument.

I faced the same issue, and solved it by using a FileLock like the other answer.
In my case, the arguments that are passed to the launched processes needed to be forwarded to the first process. For this, I used a named pipe, which includes the username in its name. The first process creates the named pipe at \.\pipe\app_$USER. If the same exe is is started by the the same user, it is detected by the FileLock, and the agruments are passed through the named pipe.

Is it possible to supply a new PropertiesConfiguration file at runtime?

Background:
I have a requirement that messages displayed to the user must vary both by language and by company division. Thus, I can't use out of the box resource bundles, so I'm essentially writing my own version of resource bundles using PropertiesConfiguration files.
In addition, I have a requirement that messages must be modifiable dynamically in production w/o doing restarts.
I'm loading up three different iterations of property files:
-basename_division.properties
-basename_2CharLanguageCode.properties
-basename.properties
These files exist in the classpath. This code is going into a tag library to be used by multiple portlets in a Portal.
I construct the possible .properties files, and then try to load each of them via the following:
PropertiesConfiguration configurationProperties;
try {
configurationProperties = new PropertiesConfiguration(propertyFileName);
configurationProperties.setReloadingStrategy(new FileChangedReloadingStrategy());
} catch (ConfigurationException e) {
/* This is ok -- it just means that the specific configuration file doesn't
exist right now, which will often be true. */
return(null);
}
If it did successfully locate a file, it saves the created PropertiesConfiguration into a hashmap for reuse, and then tries to find the key. (Unlike regular resource bundles, if it doesn't find the key, it then tries to find the more general file to see if the key exists in that file -- so that only override exceptions need to be put into language/division specific property files.)
The Problem:
If a file did not exist the first time it was checked, it throws the expected exception. However, if at a later time a file is then later dropped into the classpath and this code is then re-run, the exception is still thrown. Restarting the portal obviously clears the problem, but that's not useful to me -- I need to be able to allow them to drop new messages in place for language/companyDivision overrides w/o a restart. And I'm not that interested in creating blank files for all possible divisions, since there are quite a few divisions.
I'm assuming this is a classLoader issue, in that it determines that the file did not exist in the classpath the first time, and caches that result when trying to reload the same file. I'm not interested in doing anything too fancy w/ the classLoader. (I'd be the only one who would be able to understand/maintain that code.) The specific environment is WebSphere Portal.
Any ways around this or am I stuck?

My guess is that I am not sure if Apache's FileChangedReloadingStrategy also reports the events of ENTRY_CREATE on a file system directory.
If you're using Java 7, I propose to try the following. Simply, implement a new ReloadingStrategy using Java 7 WatchService. In this way, every time either a file is changed in your target directories or a new property file is placed there, you poll for the event and able to add the properties to your application.
If not on Java 7, maybe using a library such as JNotify would be a better solution to get the event of a new entry in a directory. But again, you need to implement the ReloadingStrategy.
UPDATE for Java 6:
PropertiesConfiguration configurationProperties;
try {
configurationProperties = new PropertiesConfiguration(propertyFileName);
configurationProperties.setReloadingStrategy(new FileChangedReloadingStrategy());
} catch (ConfigurationException e) {
JNotify.addWatch(propertyFileDirectory, JNotify.FILE_CREATED, false, new FileCreatedListener());
}
where
class FileCreatedListener implements JNotifyListener {
// other methods
public void fileCreated(int watchId, String rootPath, String fileName) {
configurationProperties = new PropertiesConfiguration(rootPath + "/" + fileName);
configurationProperties.setReloadingStrategy(new FileChangedReloadingStrategy());
// or any other business with configurationProperties
}
}

Apache JCI FilesystemAlterationMonitor processes changes for existing folder contents on startup

I am using Apache JCI's FAM (FileAlterationMonitor) in a Java OSGi Service to monitor and handle changes in the FileSystem. Everything seems to be working fairly well except whenever I start the Service (which starts the FAM using the code below), FAM picks up on ALL the changes that exist in the directory.
Currently I am watching /tmp
/tmp includes a subtree: /tmp/foo/bar/cat/dog
Everytime I start the service and which starts FAM, it reports DirectoryCreate events for:
/tmp/foo
/tmp/foo/bar
/tmp/foo/bar/cat
/tmp/foo/bar/cat/dog
Even if no changes have been made to any part of that subtree.
Code run on service activation:
File watchFolder = new File("/tmp");
watchFolder.mkdirs();
fam = new FilesystemAlterationMonitor();
fam.setInterval(1000);
fam.addListener(watchFolder, listener);
fam.start();
// I've already tried adding:
listener.waitForFirstCheck();
Listener example:
private FileChangeListener listener = new FileChangeListener() {
public void onDirectoryChange(File pDir) { System.out.println(pDir.getAbsolutePath()); }
public void onDirectoryCreate(File pDir) { System.out.println(pDir.getAbsolutePath()); }
...
}

Yes, that's one very annoying feature of JCI. When monitoring is started, it will notify you of all the files and directories it finds with calls to onXxxCreate(). I think you have the following options
After starting the monitoring, wait for some time (couple of seconds) in your FileChangeListener callback implementation before you actually process the events coming from JCI. That's what I did in a project and it works fairly well, although there is the possibility that you miss an actual file creation that just happens within the "grace period"
Take the sources of JCI and modify them to use two new event methods onDirectoryFound(File)and onFileFound(File) that will only be fired when files and directories are found on startup of the monitoring
Take a look at java.nio.file.WatchService that comes with Java 7. IMO the best option, as it uses native methods internally in order to be notified of changes by the OS, instead of starting a thread and checking periodically. With JCI, you may get delays in the range of several seconds until changes are propagated to your callbacks

Forget about WatchService. It is not intuitive and there are issues with it when trying to see if it can detect that the folder it is monitoring is deleted or changed. I would stay far away from it. I have worked with Watcher but prefer Apache IO much more. I believe Camel uses it as well.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.