I am trying to integrate Multithreading with FileWatcher service in java. i.e., I am constantly listening to a particular directory -> whenever a new file is created, I need to spawn a new thread which processes the file (say it prints the file contents). I kind of managed to write a code which compiles and works (but not as expected). It works sequentially meaning file2 is processed after file1 and file 3 is processed after file 2. I want this to be executed in parallel.
Adding the code snippet:
while(true) {
WatchKey key;
try {
key = watcher.take();
Path dir = keys.get(key);
for (WatchEvent<?> event: key.pollEvents()) {
WatchEvent.Kind<?> kind = event.kind();
if (kind == StandardWatchEventKinds.OVERFLOW) {
continue;
}
if(kind == StandardWatchEventKinds.ENTRY_CREATE){
boolean valid = key.reset();
if (!valid) {
break;
}
log.info("New entry is created in the listening directory, Calling the FileProcessor");
WatchEvent<Path> ev = (WatchEvent<Path>)event;
Path newFileCreatedResolved = dir.resolve(ev.context());
try{
FileProcessor processFile = new FileProcessor(newFileCreatedResolved.getFileName().toString());
Future<String> result = executor.submit(processFile);
try {
System.out.println("Processed File" + result.get());
} catch (ExecutionException e) {
e.printStackTrace();
}
//executor.shutdown(); add logic to shut down
}
}
}
}
}
and the FileProcessor class
public class FileProcessor implements Callable <String>{
FileProcessor(String triggerFile) throws FileNotFoundException, IOException{
this.triggerFile = triggerFile;
}
public String call() throws Exception{
//logic to write to another file, this new file is specific to the input file
//returns success
}
What is happening now -> If i transfer 3 files at a time, they are sequentially. First file1 is written to its destination file, then file2, file3 so on.
Am I making sense? Which part I need to change to make it parallel? Or Executor service is designed to work like that.
The call to Future.get() is blocking. The result isn't available until processing is complete, of course, and your code doesn't submit another task until then.
Wrap your Executor in a CompletionService and submit() tasks to it instead. Have another thread consume the results of the CompletionService to do any processing that is necessary after the task is complete.
Alternatively, you can use the helper methods of CompletableFuture to set up an equivalent pipeline of actions.
A third, simpler, but perhaps less flexible option is simply to incorporate the post-processing into the task itself. I demonstrated a simple task wrapper to show how this might be done.
Related
I have a watch service running on a folder, when I am trying to modify and existing file using evenKind == Modify (basically pasting a same file without removing the current file) I am getting FileNotFoundException (The process cannot access the file because it is being used by another process.)
if (eventKind == StandardWatchEventKinds.ENTRY_MODIFY) {
String newFileChecksum = null;
if (eventPath.toFile().exists()) {
newFileChecksum = getFileChecksum(eventPath.toFile());
}
if (fileMapper.containsKey(eventPath)) {
String existingFileChecksum = fileMapper.get(eventPath);
if (!existingFileChecksum.equals(newFileChecksum)) {
fileMapper.replace(eventPath, existingFileChecksum, newFileChecksum);
log.info("listener.filemodified IN");
for (DirectoryListener listener : this.listeners) {
listener.fileModified(this, eventPath);
}
log.info("listener.filemodified OUT");
} else {
log.info("existing checksum");
log.debug(String.format(
"Checksum for file [%s] has not changed. Skipping plugin processing.",
eventPath.getFileName()));
}
}
}
In the code when...getFileChecksum() is called
if (eventPath.toFile().exists()) {
newFileChecksum = getFileChecksum(eventPath.toFile());
}
So ideally, eventPath.toFile().exists() is TRUE, hence code is going inside if but when getFileChecksum() is called, it goes to method...
private synchronized String getFileChecksum(File file) throws IOException, NoSuchAlgorithmException {
MessageDigest md5Digest = MessageDigest.getInstance("MD5");
FileInputStream fis = null;
if(file.exists()) {
try {
fis = new FileInputStream(file);
} catch(Exception e) {
e.printStackTrace();
}
} else {
log.warn("File not detected.");
}
byte[] byteArray = new byte[1024];
int bytesCount = 0;
while ((bytesCount = fis.read(byteArray)) != -1) {
md5Digest.update(byteArray, 0, bytesCount);
};
fis.close();
byte[] bytes = md5Digest.digest();
StringBuilder stringBuilder = new StringBuilder();
for (int i=0; i< bytes.length ;i++) {
stringBuilder.append(Integer.toString((bytes[i] & 0xff) + 0x100, 16).substring(1));
}
return stringBuilder.toString();
}
}
An exception is coming fis = new FileInputStream(file); even if the file is present in the folder.
FileNotFoundException (The process cannot access the file because it is being used by another process.)
I created a RandomAccessFile and a channel to release any LOCK placed on file, but it is not working. Please suggest what could be happening here.
//UPDATE --> This is the infinite while loop that I have,
WHAT IS HAPPENING? WHEN I PUT A FILE 1 create and 2 update are getting called, suppose, when I am deleting the file, 1 delete 1 modify is being called, and IF I PUT THE SAME FILE BACK TO FOLDER, I GET CREATE but before CREATE is finishing, MODIFY IS BEING called. and create is not running instead modify is running.
I fixed this issue by putting Thread.sleep(500) between
WatchKey wk = watchService.take();
Thread.sleep(500)
for (WatchEvent<?> event : wk.pollEvents()) {
But I dont think I can justify use of sleep here. Please help
WatchService watchService = null;
WatchKey watchKey = null;
while (!this.canceled && (watchKey == null)) {
watchService = watchService == null
? FileSystems.getDefault().newWatchService() : watchService;
watchKey = this.directory.register(watchService,
StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_DELETE,
StandardWatchEventKinds.ENTRY_CREATE);
}
while (!this.canceled) {
try {
WatchKey wk = watchService.take();
for (WatchEvent<?> event : wk.pollEvents()) {
Kind<?> eventKind = event.kind();
System.out.println("Event kind : " + eventKind);
Path dir = (Path)wk.watchable();
Path eventPath = (Path) event.context();
Path fullPath = dir.resolve(eventPath);
fireEvent(eventKind, fullPath);
}
wk.reset();
}
I have a better approach, use and a while loop on a var isFileReady like this...
var isFileReady = false;
while(!isFile...) {
}
inside while create a try and catch.
try {
FileInputStream fis = new FileInputStream();
isFileReady = true;
} catch () {
catch exception or print file not ready.
}
This will solve your problem.
The WatchService is verbose and may report multiple ENTRY_MODIFY events for save operation - even when another application is part way through or doing writes repeatedly. Your code is probably acting on a modify event while the other app is still writing and there may be a second ENTRY_MODIFY on its way.
A safer strategy for using the WatchService is to collate the events you receive and only act on the changes when there is a pause. Something like this will ensure that you block on first event but then poll the watch service with small timeout to see if more changes are present before you act on the previous set:
WatchService ws = ...
HashSet<Path> modified = new HashSet<>();
while(appIsRunning) {
int countNow = modified.size();
WatchKey k = countNow == 0 ? ws.take() : ws.poll(1, TimeUnit.MILLISECONDS);
if (k != null) {
// Loop through k.pollEvents() and put modify file path into modified set:
// DO NOT CALL fireEvent HERE, save the path instead:
...
if (eventKind == ENTRY_MODIFY)
modified.add(filePath);
}
// Don't act on changes unless no new events:
if (countNow == modified.size()) {
// ACT ON modified list here - the watch service did not report new changes
for (Path filePath : modified) {
// call fireEvent HERE:
fireEvent(filePath);
}
// reset the list so next watch call is take() not poll(1)
modified.clear();
}
}
If you are also looking out for CREATE and DELETE operations with MODIFY you will have to collate and ignore some of the earlier events because the last recorded event type can take precedence over a previously recorded type. For example, if calling take() then poll(1) until nothing new is reported:
Any DELETE then CREATE => you might want to consider as MODIFY
Any CREATE then MODIFY => you might want to consider as CREATE
Any CREATE or MODIFY then a DELETE => treat as DELETE
Your logic would also want to only act when value of modified.size() + created.size() + deleted.size() gets changed between runs.
let me guess...
modify event gets called when you modify a file. to modify the file you most likely use a seperate tool like notepad that opens and LOCKS the file.
your watcher gets an event that the file gets modified (right now) but you can not modify it again (which fileinputstream wants to do) since it is locked already.
A process I've been working on for a little while now. Process was running fine until the performance was taking a hit. I figured out a way to get it to perform very fast, but I'm really unsure what is happening behind the scenes. And it's now throwing warnings and errors and I'm not sure what to do. File is getting porocessed but I'm not sure if all threads are complete, and I don't believe I am shutting down the app correctly. Here is everything you need to know...
File is read using a buffered reader, we then run some data quality checks on each record, every record that is read and passes data quality checks we create a java object out of it and insert into a List. Once the List is 1000 objects big, we then call an OracleService class which has a Repo autowired and we execute a saveAll method with the List. We then continue to read the file and do this until the file is done being read. I am passing in, to the service, and ExecutorService object. So every time we call that service it is getting a new List object containing my objects (this object is basically the table we are loading) and a new ExecutorService Object. Process is running fine but getting a ton of exceptions being thrown once I try to shutdown. Here is all my code...
My Controller class run method. This will get called from another class which implements CommandLineRunner
public void run() throws ParseException, IOException, InterruptedException {
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load starting ********************");
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile))) {
String line = reader.readLine();
int count = 0;
TrialBalanceBuilder builder = new TrialBalanceBuilder();
while (line != null) {
if (line.startsWith("D")) {
if (dataQuality(line)) {
TrialBalance trialBalance = builder.buildTrialBalanceObject(line, procDt, time);
insertList.add(trialBalance);
count++;
if (count == 1000) {
oracleService.loadToTableTrialBalance(insertList, executorService);
count = 0;
insertList.clear();
}
} else {
logger.info("Data quality check FAILED for record: " + line);
oracleService.revertInserts("DDA_TRIAL_BAL_STG",procDt.toString());
System.exit(111);
}
}
line = reader.readLine();
}
logger.info("Leftover record count is " + insertList.size());
oracleService.loadToTableTrialBalance(insertList, executorService);
} catch (IOException e) {
e.printStackTrace();
}
logger.info("Updating Metadata table with new batch proc date");
InclearingBatchMetadataBuilder inclearingBatchMetadataBuilder = new InclearingBatchMetadataBuilder();
InclearingBatchMetadata inclearingBatchMetadata = inclearingBatchMetadataBuilder.buildInclearingBatchMetadataObject("DDA_TRIAL_BAL_STG", procDt, time, Constants.bankID);
oracleService.insertBatchProcDtIntoMetaTable(inclearingBatchMetadata);
logger.info("Successfully updated Metadata table with new batch proc date: " + procDt);
Thread.sleep(10000);
oracleService.cleanUpGOS("DDA_TRIAL_BAL_STG",1);
executorService.shutdownNow();
logger.info("******************** Aegis Check Inclearing DDA Trial Balance Table Load ended successfully ********************");
}
I'm passing in an ExecutorService object to the service class. This is defined as...
private final ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("Orders-%d").setDaemon(true).build();
private ExecutorService executorService = Executors.newFixedThreadPool(10, threadFactory);
My service class looks as such....
#Service("oracleService")
public class OracleService {
private static final Logger logger = LoggerFactory.getLogger(OracleService.class);
#Autowired
TrialBalanceRepo trialBalanceRepo;
#Transactional
public void loadToTableTrialBalance(List<TrialBalance> trialBalanceList, ExecutorService executorService) {
logger.debug("Let's load to the database");
logger.debug(trialBalanceList.toString());
List<TrialBalance> multiThreadList = new ArrayList<>(trialBalanceList);
try {
executorService.execute(() -> trialBalanceRepo.saveAll(multiThreadList));
} catch (ConcurrentModificationException | DataIntegrityViolationException ignored) {}
logger.debug("Successfully loaded to database");
}
In my run method i then call a few more methods in that Service class which create nativequeries and execute on the database (for purging etc.)
Anyway, I never know when the threads are complete. And I am finding in pre-production, when running with a lot of data, we shut down the app and not all the data is completely loaded. Also I don't know if this is even the best design. Do I keep passing in these executorservice objects? The whole point of this was to get optimal parallelism going so that our performance was better. Perhaps there is a better way (preferably without redesigning the entire app and using something other than JPA)
I'm trying to build a process that will watch a list of directories (populated via JPA) and when a new file is detected in a folder a new thread is started to process that folder. A maximum of one thread should only be running per folder but multiple threads could run spanning different folders.
I've got that working somewhat with the below code but the issue I've found is.. say 1 out of 5 files have moved so far. A thread will be immediately made once the first is detected, the ProcessDatasource thread would then loop through the dir and make 1 file objects to process. In the mean time 4 files would trigger the systemfilewatcher but would block due to a datasource thread already running on that folder. Now since filesystemwatcher will have already triggered when the files landed it won't run again which will leave those 4 files in limbo until another lands in that folder....
To solve this I thought if a file lands and a thread is already running I could call a method within the thread to add the file to the List of files it's currently processing but I'm struggling to do that when the threads are made dynamically in the below loop. Of course this could just be an awful way of doing all this so open to any suggestions.
private boolean checkThreadRunning(String threadName){
Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
for ( Thread t : threadSet){
if ( t.getThreadGroup() == Thread.currentThread().getThreadGroup() && t.getName().equals(threadName)) {
return true;
}
}
return false;
}
public void run(String... args) throws IOException {
WatchService watchService = FileSystems.getDefault().newWatchService();
List<DataSource> datasourceList = readDataSources(); // Load a list of DataSource objects into the datasourceList.
Map<WatchKey, DataSource> keys = registerKeys(watchService, datasourceList);
WatchKey key;
while ((key = watchService.take()) != null) {
DataSource dataSource = keys.get(key);
for (WatchEvent<?> event : key.pollEvents()) {
String dataSourceName = dataSource.getDatasourceName();
String threadName = "datasourceThread-" + dataSourceName;
// Check if there is already a thread running on this datasource (folder)
if (checkThreadRunning(threadName)) {
System.out.println("Found another file for datasource " + dataSourceName + "but an instance is already running");
// Need something here to pass this new file into the currently running thread to be processed...
} else {
// If not then start a thread which will work through processing the files within the folder.
new Thread(new ProcessDatasource(threadName, dataSource)).start();
}
}
key.reset();
}
}
I'm having an issue with WatchService. Here is a snippet of my code:
public void watch(){
//define a folder root
Path myDir = Paths.get(rootDir+"InputFiles/"+dirName+"/request");
try {
WatchService watcher = myDir.getFileSystem().newWatchService();
myDir.register(watcher, StandardWatchEventKinds.ENTRY_CREATE);
WatchKey watckKey = watcher.take();
List<WatchEvent<?>> events = watckKey.pollEvents();
for (WatchEvent event : events) {
//stuff
}
}catch(Exception e){}
watckKey.reset();
}
*First of all, know that watch() is called inside an infinite loop.
The problem is that when creating multiple files at a time, some events are missing. For example, if I copy-paste three files into the ".../request" folder, only one gets caught, the others remain as if nothing happened, neither an OVERFLOW event is triggered. In some different Computer and OS, it reaches up to two files, but if one tries 3 or more, the rest still untouched.
I found a workaround though, but I don't think it's the best practice. This is the flow:
The process starts and then stops at
WatchKey watckKey = watcher.take();
as expected, (as per Processing events). Then, I drop 3 files together in "request" folder, thus, process resumes at
List<WatchEvent<?>> events = watckKey.pollEvents();
The issue is here. It seems like the thread goes so fast through this line that two CREATED events stay behind and are lost, only one is taken. The workaround was to add an extra line right above this one, like this:
Thread.sleep(1000);
List<WatchEvent<?>> events = watckKey.pollEvents();
This seems to be a solution, at least for three and several more simultaneous files, but it's not scalable at all.
So in conclusion, I would like to know if there is a better solution for this issue. FYI, I'm running a Win 7 64
Thanks a lot in advance!
Be sure to reset your watchKey. Some of the aforementioned answers don't, which could explain dropped events as well. I recommend the examples given in the official Oracle documentation: https://docs.oracle.com/javase/tutorial/essential/io/notification.html
Beware that, even when used correctly, the reliability of file services depends heavily on the underlying OS. In general, it should be considered a best-effort mechanism that doesn't give a 100% guarantee.
If watch is called inside a infinite loop then you are creating watch service infinite no of times hence the possibility of losing events , I would suggest do the following , Call your method watchservice once:
public void watchservice()
{
Thread fileWatcher = new Thread(() ->
{
Path path = Paths.get(rootDir+"InputFiles/"+dirName+"/request");
Path dataDir = Paths.get(path);
try
{
WatchService watcher = dataDir.getFileSystem().newWatchService();
dataDir.register(watcher, StandardWatchEventKinds.ENTRY_CREATE);
while (true)
{
WatchKey watckKey;
try
{
watckKey = watcher.take();
}
catch (Exception e)
{
logger.error("watchService interupted:", e);
return;
}
List<WatchEvent<?>> events = watckKey.pollEvents();
for (WatchEvent<?> event : events)
{
logger.debug("Event Type : "+ event.kind() +" , File name found :" + event.context());
if (event.kind() != StandardWatchEventKinds.OVERFLOW)
{
// do your stuff
}
}
}
}
catch (Exception e)
{
logger.error("Error: " , e);
}
});
fileWatcher.setName("File-Watcher");
fileWatcher.start();
fileWatcher.setUncaughtExceptionHandler((Thread t, Throwable throwable) ->
{
logger.error("Error ocurred in Thread " + t, throwable);
});
}
I'm currently using jpathwatch to watch for new files created in a folder. All fine, but I need to find out when a program finished writing to a file.
The library's author describes on his website (http://jpathwatch.wordpress.com/faq/) how that's done but somehow I don't have a clue how to do that. Maybe it's described a bit unclear or I just don't get it.
I would like to ask whether you could give me a snippet which demonstrates how to do that.
This is the basic construct:
public void run() {
while (true) {
WatchKey signalledKey;
try {
signalledKey = watchService.take();
} catch (InterruptedException ix) {
continue;
} catch (ClosedWatchServiceException cwse) {
break;
}
List<WatchEvent<?>> list = signalledKey.pollEvents();
signalledKey.reset();
for (WatchEvent<?> e : list) {
if (e.kind() == StandardWatchEventKind.ENTRY_CREATE) {
Path context = (Path) e.context();
String filename = context.toString();
// do something
} else if (e.kind() == StandardWatchEventKind.ENTRY_DELETE) {
Path context = (Path) e.context();
String filename = context.toString();
// do something
} else if (e.kind() == StandardWatchEventKind.OVERFLOW) {
}
}
}
}
From the FAQ for jpathwatch, the author says that you will get an ENTRY_MODIFY event regularly when a file is being written and that event will stop being generated when the file writing is complete. He is suggesting that you keep a list of files and the time stamp for the last generated event for each file.
At some interval (which he refers to as a timeout), you scan through the list of files and their timestamps. If any file has a time stamp that is older than your timeout interval, then that should mean that it isn't being updated anymore and is probably complete.
He even suggests you try to determine the rate at a file is growing and calculate out when it should complete so that you can set your poll time to the expected completion duration.
Does that clear it up at all? Sorry I'm not up to expressing that in code :)