I have a recursive watch service that I'm using to monitor directories while the application is running. For an unknown reason, the watchservice appears stop working after about a day. At that point I can add a new file to a monitored directory and get no log statements and my observers are not notified.
I thought Spring might be destroying the bean, so I added a log statement to the #pre-destroy section of the class, but that log statement doesn't show up after the watchservice stops working, so it seems that bean still exists, it's just not functioning as expected. The class is as follows
import com.sun.nio.file.SensitivityWatchEventModifier;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Service;
import java.io.File;
import java.io.IOException;
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.function.Consumer;
import javax.annotation.PostConstruct;
import javax.annotation.PreDestroy;
import static java.nio.file.StandardWatchEventKinds.ENTRY_CREATE;
import static java.nio.file.StandardWatchEventKinds.ENTRY_DELETE;
#Service
public class DirectoryMonitor {
private static final Logger logger = LoggerFactory.getLogger(DirectoryMonitor.class);
private WatchService watcher;
private ExecutorService executor;
private List<DirectoryMonitorObserver> observerList = new ArrayList<>();
private final Map<WatchKey, Path> keys = new HashMap<>();
public void addObserver(DirectoryMonitorObserver observer){
observerList.add(observer);
}
private void notifyObservers(){
observerList.forEach(DirectoryMonitorObserver::directoryModified);
}
#PostConstruct
public void init() throws IOException {
watcher = FileSystems.getDefault().newWatchService();
executor = Executors.newSingleThreadExecutor();
}
#PreDestroy
public void cleanup() {
try {
logger.info("Stopping directory monitor");
watcher.close();
} catch (IOException e) {
logger.error("Error closing watcher service", e);
}
executor.shutdown();
}
#SuppressWarnings("unchecked")
public void startRecursiveWatcher(String pathToMonitor) {
logger.info("Starting Recursive Watcher");
Consumer<Path> register = p -> {
if (!p.toFile().exists() || !p.toFile().isDirectory())
throw new RuntimeException("folder " + p + " does not exist or is not a directory");
try {
Files.walkFileTree(p, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
logger.info("registering " + dir + " in watcher service");
WatchKey watchKey = dir.register(watcher, new WatchEvent.Kind[]{ENTRY_CREATE, ENTRY_DELETE}, SensitivityWatchEventModifier.HIGH);
keys.put(watchKey, dir);
return FileVisitResult.CONTINUE;
}
});
} catch (IOException e) {
throw new RuntimeException("Error registering path " + p);
}
};
register.accept(Paths.get(pathToMonitor));
executor.submit(() -> {
while (true) {
final WatchKey key;
try {
key = watcher.take();
} catch (InterruptedException ex) {
logger.error(ex.toString());
continue;
}
final Path dir = keys.get(key);
key.pollEvents().stream()
.map(e -> ((WatchEvent<Path>) e).context())
.forEach(p -> {
final Path absPath = dir.resolve(p);
if (absPath.toFile().isDirectory()) {
register.accept(absPath);
} else {
final File f = absPath.toFile();
logger.info("Detected new file " + f.getAbsolutePath());
}
});
notifyObservers();
key.reset();
}
});
}
}
This is where I'm creating the monitor bean..
#Component
public class MovieInfoFacade {
#Value("${media.path}")
private String mediaPath;
private MovieInfoControl movieInfoControl;
private DirectoryMonitor directoryMonitor;
private FileListProvider fileListProvider;
#Autowired
public MovieInfoFacade(MovieInfoControl movieInfoControl, DirectoryMonitor directoryMonitor, FileListProvider fileListProvider){
this.movieInfoControl = movieInfoControl;
this.directoryMonitor = directoryMonitor;
this.fileListProvider = fileListProvider;
}
#PostConstruct
public void startDirectoryMonitor(){
if(!mediaPath.equalsIgnoreCase("none")) {
directoryMonitor.addObserver(fileListProvider);
directoryMonitor.startRecursiveWatcher(mediaPath);
}
}
public int loadMovieListLength(String directoryPath){
return fileListProvider.listFiles(directoryPath).length;
}
public List<MovieInfo> loadMovieList(MovieSearchCriteria searchCriteria) {
List<File> files = Arrays.asList(fileListProvider.listFiles(searchCriteria.getPath()));
return files.parallelStream()
.sorted()
.skip(searchCriteria.getPage() * searchCriteria.getItemsPerPage())
.limit(searchCriteria.getItemsPerPage())
.map(file -> movieInfoControl.loadMovieInfoFromCache(file.getAbsolutePath()))
.collect(Collectors.toList());
}
public MovieInfo loadSingleMovie(String filePath) {
return movieInfoControl.loadMovieInfoFromCache(filePath);
}
}
It appears that the error was in my exception handling. After removing the throw statements (and replacing them with logs) I have not had any issues.
Related
I am trying to build a application that watch a folder and its sub folders to detect file creation or modification. Total files to watch will be growing day by day.
I had tried with java nio WatchService and apache common FileAlterationObserver. WatchService sometimes missing event when file creation/modification happens after WatchKey is taken and before reset. Since FileAlterationObserver is based on polling, when file count is increasing performance is also degrading.
What will be the best approach to build such an application?
Thank you #DuncG. After going through the sample mentioned, I found my solution to my problem.
Adding this sample code if someone facing the same problem.
Here in the example I am adding all the events to a set (this will remove the duplicate events) and process the saved events once the WatchKey is empty. New directories will be registered to WatchService while processing the saved events.
package com.filewatcher;
import java.io.File;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import java.nio.file.LinkOption;
import java.nio.file.Path;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.StandardWatchEventKinds;
import java.nio.file.WatchEvent;
import java.nio.file.WatchKey;
import java.nio.file.WatchService;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class WatchService implements Runnable {
private static final long POLL_DELAY = 3;
private static final Logger LOGGER = LoggerFactory.getLogger(WatchService.class);
private final WatchService watcher;
private final Map<WatchKey, Path> keys;
private final Set<Path> events = new HashSet<Path>();
public WatchService(Path dir) throws IOException {
this.watcher = FileSystems.getDefault().newWatchService();
this.keys = new HashMap<WatchKey, Path>();
walkAndRegisterDirectories(dir);
}
#Override
public void run() {
while (true) {
try {
WatchKey key;
try {
key = watcher.poll(POLL_DELAY, TimeUnit.SECONDS);
} catch (InterruptedException x) {
return;
}
if (key != null) {
Path root = keys.get(key);
for (WatchEvent<?> event : key.pollEvents()) {
Path eventPath = (Path) event.context();
if (eventPath == null) {
System.out.println(event.kind());
continue;
}
Path fullPath = root.resolve(eventPath);
events.add(fullPath);
}
boolean valid = key.reset();
if (!valid) {
keys.remove(key);
}
} else {
if (events.size() > 0) {
processEvents(events);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
/**
* Process events and register new directory with watch service
* #param events
* #throws IOException
*/
private void processEvents(Set<Path> events) throws IOException {
for (Path path : events) {
// register directory with watch service if its not already registered
if (Files.isDirectory(path, LinkOption.NOFOLLOW_LINKS) && !this.keys.containsValue(path)) {
registerDirectory(path);
// Since new directory was not registered, get all files inside the directory.
// new/modified files after this will get notified by watch service
File[] files = path.toFile().listFiles();
for (File file : files) {
LOGGER.info(file.getAbsolutePath());
}
} else {
LOGGER.info(path.toString());
}
}
// clear events once processed
events.clear();
}
/**
* Register a directory and its sub directories with watch service
* #param root folder
* #throws IOException
*/
private void walkAndRegisterDirectories(final Path root) throws IOException
{
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
registerDirectory(dir);
return FileVisitResult.CONTINUE;
}
});
}
/**
* Register a directory with watch service
* #param directory
* #throws IOException
*/
private void registerDirectory(Path dir) throws IOException {
WatchKey key = dir.register(this.watcher, StandardWatchEventKinds.ENTRY_CREATE,
StandardWatchEventKinds.ENTRY_MODIFY);
this.keys.put(key, dir);
}
}
public class FileWatcherApplication implements CommandLineRunner {
#Value("${filewatch.folder}")
private String rootPath;
public static void main(String[] args) {
SpringApplication.run(FileWatcherApplication.class, args);
}
#Override
public void run(String... args) throws Exception {
File rootFolder = new File(rootPath);
if (!rootFolder.exists()) {
rootFolder.mkdirs();
}
new Thread(new WatchService(Paths.get(rootPath)), "WatchThread").start();
}
}
Unable to use StreamingFileSink and store incoming events in compressed fashion.
I am trying to use StreamingFileSink to write unbounded event stream to S3. In the process, I would like to compress the data to make better use of storage size available.
I wrote a compressed string writer, by borrowing some code from SequenceFileWriterFactory from flink. It fails with the exception I described below.
If I try to use BucketingSink, it works great.
Using BucketingSink, I approached compressed string write as below. Again, I borrowed this code from some other pull request.
import org.apache.flink.streaming.connectors.fs.StreamWriterBase;
import org.apache.flink.streaming.connectors.fs.Writer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.compress.CodecPool;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Compressor;
import java.io.IOException;
public class CompressionStringWriter<T> extends StreamWriterBase<T> implements Writer<T> {
private static final long serialVersionUID = 3231207311080446279L;
private String codecName;
private String separator;
public String getCodecName() {
return codecName;
}
public String getSeparator() {
return separator;
}
private transient CompressionOutputStream compressedOutputStream;
public CompressionStringWriter(String codecName, String separator) {
this.codecName = codecName;
this.separator = separator;
}
public CompressionStringWriter(String codecName) {
this(codecName, System.lineSeparator());
}
protected CompressionStringWriter(CompressionStringWriter<T> other) {
super(other);
this.codecName = other.codecName;
this.separator = other.separator;
}
#Override
public void open(FileSystem fs, Path path) throws IOException {
super.open(fs, path);
Configuration conf = fs.getConf();
CompressionCodecFactory codecFactory = new CompressionCodecFactory(conf);
CompressionCodec codec = codecFactory.getCodecByName(codecName);
if (codec == null) {
throw new RuntimeException("Codec " + codecName + " not found");
}
Compressor compressor = CodecPool.getCompressor(codec, conf);
compressedOutputStream = codec.createOutputStream(getStream(), compressor);
}
#Override
public void close() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.close();
compressedOutputStream = null;
} else {
super.close();
}
}
#Override
public void write(Object element) throws IOException {
getStream();
compressedOutputStream.write(element.toString().getBytes());
compressedOutputStream.write(this.separator.getBytes());
}
#Override
public CompressionStringWriter<T> duplicate() {
return new CompressionStringWriter<>(this);
}
}
BucketingSink<DeviceEvent> bucketingSink = new BucketingSink<>("s3://"+ this.bucketName + "/" + this.objectPrefix);
bucketingSink
.setBucketer(new OrgIdBasedBucketAssigner())
.setWriter(new CompressionStringWriter<DeviceEvent>("Gzip", "\n"))
.setPartPrefix("file-")
.setPartSuffix(".gz")
.setBatchSize(1_500_000);
The one with BucketingSink works.
Now my code snippets using StreamingFileSink involves the below set of code.
import org.apache.flink.api.common.serialization.BulkWriter;
import java.io.IOException;
public class CompressedStringBulkWriter<T> implements BulkWriter<T> {
private final CompressedStringWriter compressedStringWriter;
public CompressedStringBulkWriter(final CompressedStringWriter compressedStringWriter) {
this.compressedStringWriter = compressedStringWriter;
}
#Override
public void addElement(T element) throws IOException {
this.compressedStringWriter.write(element);
}
#Override
public void flush() throws IOException {
this.compressedStringWriter.flush();
}
#Override
public void finish() throws IOException {
this.compressedStringWriter.close();
}
}
import org.apache.flink.api.common.serialization.BulkWriter;
import org.apache.flink.core.fs.FSDataOutputStream;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
public class CompressedStringBulkWriterFactory<T> implements BulkWriter.Factory<T> {
private SerializableHadoopConfiguration serializableHadoopConfiguration;
public CompressedStringBulkWriterFactory(final Configuration hadoopConfiguration) {
this.serializableHadoopConfiguration = new SerializableHadoopConfiguration(hadoopConfiguration);
}
#Override
public BulkWriter<T> create(FSDataOutputStream out) throws IOException {
return new CompressedStringBulkWriter(new CompressedStringWriter(out, serializableHadoopConfiguration.get(), "Gzip", "\n"));
}
}
import org.apache.flink.core.fs.FSDataOutputStream;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.core.fs.Path;
import org.apache.flink.runtime.fs.hdfs.HadoopFileSystem;
import org.apache.flink.util.Preconditions;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CodecPool;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Compressor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.Serializable;
public class CompressedStringWriter<T> implements Serializable {
private static final Logger LOG = LoggerFactory.getLogger(CompressedStringWriter.class);
private static final long serialVersionUID = 2115292142239557448L;
private String separator;
private transient CompressionOutputStream compressedOutputStream;
public CompressedStringWriter(FSDataOutputStream out, Configuration hadoopConfiguration, String codecName, String separator) {
this.separator = separator;
try {
Preconditions.checkNotNull(hadoopConfiguration, "Unable to determine hadoop configuration using path");
CompressionCodecFactory codecFactory = new CompressionCodecFactory(hadoopConfiguration);
CompressionCodec codec = codecFactory.getCodecByName(codecName);
Preconditions.checkNotNull(codec, "Codec " + codecName + " not found");
LOG.info("The codec name that was loaded from hadoop {}", codec);
Compressor compressor = CodecPool.getCompressor(codec, hadoopConfiguration);
this.compressedOutputStream = codec.createOutputStream(out, compressor);
LOG.info("Setup a compressor for codec {} and compressor {}", codec, compressor);
} catch (IOException ex) {
throw new RuntimeException("Unable to compose a hadoop compressor for the path", ex);
}
}
public void flush() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.flush();
}
}
public void close() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.close();
compressedOutputStream = null;
}
}
public void write(T element) throws IOException {
compressedOutputStream.write(element.toString().getBytes());
compressedOutputStream.write(this.separator.getBytes());
}
}
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class SerializableHadoopConfiguration implements Serializable {
private static final long serialVersionUID = -1960900291123078166L;
private transient Configuration hadoopConfig;
SerializableHadoopConfiguration(Configuration hadoopConfig) {
this.hadoopConfig = hadoopConfig;
}
Configuration get() {
return this.hadoopConfig;
}
// --------------------
private void writeObject(ObjectOutputStream out) throws IOException {
this.hadoopConfig.write(out);
}
private void readObject(ObjectInputStream in) throws IOException {
final Configuration config = new Configuration();
config.readFields(in);
if (this.hadoopConfig == null) {
this.hadoopConfig = config;
}
}
}
My actual flink job
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties kinesisConsumerConfig = new Properties();
...
...
DataStream<DeviceEvent> kinesis =
env.addSource(new FlinkKinesisConsumer<>(this.streamName, new DeviceEventSchema(), kinesisConsumerConfig)).name("source")
.setParallelism(16)
.setMaxParallelism(24);
final StreamingFileSink<DeviceEvent> bulkCompressStreamingFileSink = StreamingFileSink.<DeviceEvent>forBulkFormat(
path,
new CompressedStringBulkWriterFactory<>(
BucketingSink.createHadoopFileSystem(
new Path("s3a://"+ this.bucketName + "/" + this.objectPrefix),
null).getConf()))
.withBucketAssigner(new OrgIdBucketAssigner())
.build();
deviceEventDataStream.addSink(bulkCompressStreamingFileSink).name("bulkCompressStreamingFileSink").setParallelism(16);
env.execute();
I expect data to be saved in S3 as multiple files. Unfortunately no files are being created.
In the logs, I see below exception
2019-05-15 22:17:20,855 INFO org.apache.flink.runtime.taskmanager.Task - Sink: bulkCompressStreamingFileSink (11/16) (c73684c10bb799a6e0217b6795571e22) switched from RUNNING to FAILED.
java.lang.Exception: Could not perform checkpoint 1 for operator Sink: bulkCompressStreamingFileSink (11/16).
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:595)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not complete snapshot 1 for operator Sink: bulkCompressStreamingFileSink (11/16).
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:422)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
... 8 more
Caused by: java.io.IOException: Stream closed.
at org.apache.flink.fs.s3.common.utils.RefCountedFile.requireOpened(RefCountedFile.java:117)
at org.apache.flink.fs.s3.common.utils.RefCountedFile.write(RefCountedFile.java:74)
at org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.flush(RefCountedBufferingFileStream.java:105)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeAndUploadPart(S3RecoverableFsDataOutputStream.java:199)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeForCommit(S3RecoverableFsDataOutputStream.java:166)
at org.apache.flink.streaming.api.functions.sink.filesystem.PartFileWriter.closeForCommit(PartFileWriter.java:71)
at org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:63)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:239)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:280)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:253)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:244)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:235)
at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.snapshotState(StreamingFileSink.java:347)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:395)
So wondering, what am I missing.
I am using AWS EMR latest (5.23).
In CompressedStringBulkWriter#close(), you are calling the close() method on the CompressionCodecStream which also closes the underlying the stream i.e. Flink's FSDataOutputStream. It has to be opened for the checkpointing to be done properly by Flink's internal to guarantee recoverable stream. That is why you are getting
Caused by: java.io.IOException: Stream closed.
at org.apache.flink.fs.s3.common.utils.RefCountedFile.requireOpened(RefCountedFile.java:117)
at org.apache.flink.fs.s3.common.utils.RefCountedFile.write(RefCountedFile.java:74)
at org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.flush(RefCountedBufferingFileStream.java:105)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeAndUploadPart(S3RecoverableFsDataOutputStream.java:199)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeForCommit(S3RecoverableFsDataOutputStream.java:166)
So instead of compressedOutputStream.close(), use compressedOutputStream.finish() which just flushes everything that's in buffer to the outputstream without closing it. BTW, there is an inbuilt HadoopCompressionBulkWriter made available in the latest version Flink, you can also use that.
For my current side project I need to utilize a WatchService to track events in a given directory. My code is yet mainly based off of Oracles WatchService tutorial example
I however need it to be limited to folder only events (e.g ENTRY_CREATE C:\temp\folder_a).
What I'm trying to do is to take an initial Snapshot of the directory's content
and store each contents path into either dirCache or fileCache
If an new event is registered this should be checked:
is event context a file in fileCache or
is event context a new file (-> Files.isRegularFile)
so both new File events should be discarded or events from files that are already in the cache.
But printing out the events produces
ENTRY_DELETE: C:\temp\k.txt
for files but no ENTRY_CREATE or ENTRY_MODIFY.
What am I doing wrong? Am I not checking against the cache correctly or is it something completely different?
Here's the current code base:
public class Main {
public static void main(String[] args) {
try {
new DirectoryWatcher(Paths.get("C:\\temp")).processEvents();
} catch (IOException e) {
e.printStackTrace();
}
}
}
DirectoryWatcher Class
package service;
import static java.nio.file.StandardWatchEventKinds.ENTRY_CREATE;
import static java.nio.file.StandardWatchEventKinds.ENTRY_DELETE;
import static java.nio.file.StandardWatchEventKinds.ENTRY_MODIFY;
import static java.nio.file.StandardWatchEventKinds.OVERFLOW;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.WatchEvent;
import java.nio.file.WatchEvent.Kind;
import java.nio.file.WatchKey;
import java.nio.file.WatchService;
import java.util.HashMap;
import java.util.Map;
/**
* Slightly modified version of Oracle
* example file WatchDir.java
* /
public class DirectoryWatcher {
private final Path path;
private final WatchService watcher;
private final Map<WatchKey,Path> keys;
private PathSnapshot pathSnapshot;
private boolean trace = false;
#SuppressWarnings("unchecked")
static <T> WatchEvent<T> cast(WatchEvent<?> event) {
return (WatchEvent<T>)event;
}
/**
* Register the given directory with the WatchService
*/
private void register(Path dir) throws IOException {
WatchKey key = dir.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);
if (trace) {
Path prev = keys.get(key);
if (prev == null) {
System.out.format("register: %s\n", dir);
} else {
if (!dir.equals(prev)) {
System.out.format("update: %s -> %s\n", prev, dir);
}
}
}
keys.put(key, dir);
}
public DirectoryWatcher(Path dir) throws IOException {
this.watcher = FileSystems.getDefault().newWatchService();
this.keys = new HashMap<WatchKey,Path>();
this.path = dir;
this.pathSnapshot = new PathSnapshot(dir);
register(dir);
// enable trace after initial registration
this.trace = true;
}
/**
* Process all events for keys queued to the watcher
*/
void processEvents() {
for (;;) {
// wait for key to be signaled
WatchKey key;
try {
key = watcher.take();
} catch (InterruptedException x) {
return;
}
Path dir = keys.get(key);
if (dir == null) {
System.err.println("WatchKey not recognized!!");
continue;
}
for (WatchEvent<?> event: key.pollEvents()) {
Kind<?> kind = event.kind();
// TBD - provide example of how OVERFLOW event is handled
if (kind == OVERFLOW) {
continue;
}
// Context for directory entry event is the file name of entry
WatchEvent<Path> ev = cast(event);
Path name = ev.context();
Path child = dir.resolve(name);
this.updateDirContent();
/*
* currently: creating file events are neglected
* but deleting a file creates an event which is printed
* TODO: disregard delete event if sent from file
*/
boolean isFile = Files.isRegularFile(child);
if (pathSnapshot.isInFileCache(child)|| isFile) {
//disregard the event if file
event = null;
} else {
// print out event
System.out.format("%s: %s\n", event.kind().name(), child);
}
}
// reset key and remove from set if directory no longer accessible
boolean valid = key.reset();
if (!valid) {
keys.remove(key);
// all directories are inaccessible
if (keys.isEmpty()) {
break;
}
}
}
}
private void updateDirContent() {
this.pathSnapshot = pathSnapshot.updateSnapshot(path);
}
}
PathSnapshot Class
package service;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.stream.Stream;
public class PathSnapshot {
public ArrayList<Path> dirCache = new ArrayList<Path>();
public ArrayList<Path> fileCache = new ArrayList<Path>();
public PathSnapshot(Path dir) {
try {
Stream<Path> rawDirContent = Files.walk(
dir, 1);
Object[] dirContent = rawDirContent.toArray();
rawDirContent.close();
sortIntoCache(dirContent, dir);
} catch (IOException e) {
e.printStackTrace();
}
}
private void sortIntoCache(Object[] dirContent, Path rootdir) {
for (Object object : dirContent) {
//create path from element
Path objectPath = Paths.get(object.toString());
//skip start path / the root directory
if (object.equals(rootdir)) {
continue;
} else if (Files.isRegularFile(objectPath)) {
fileCache.add(objectPath);
} else if (Files.isDirectory(objectPath)) {
dirCache.add(objectPath);
}
}
}
public boolean isInFileCache(Path path) {
if (fileCache.contains(path)) {
return true;
} else {
return false;
}
}
public boolean isInDirCache(Path path) {
if (dirCache.contains(path)) {
return true;
} else {
return false;
}
}
public PathSnapshot updateSnapshot(Path dir){
return new PathSnapshot(dir);
}
}
You are listening to all possible events from the file system so there isn't more to ask for. Java can't do anything if the OS isn't presenting more events and in more detail. Some complex file system operations are just not represented by one event but of a sequence of basic events. So you have to make the best out of the events and have to interpret what a sequence of events actually means.
EDIT: This does not seem to be possible, see https://bugs.openjdk.java.net/browse/JDK-8039910.
I have a helper class that provides a Stream<Path>. This code just wraps Files.walk and sorts the output:
public Stream<Path> getPaths(Path path) {
return Files.walk(path, FOLLOW_LINKS).sorted();
}
As symlinks are followed, in case of loops in the filesystem (e.g. a symlink x -> .) the code used in Files.walk throws an UncheckedIOException wrapping an instance of FileSystemLoopException.
In my code I would like to catch such exceptions and, for example, just log a helpful message. The resulting stream could/should just stop providing entries as soon as this happens.
I tried adding .map(this::catchException) and .peek(this::catchException) to my code, but the exception is not caught in this stage.
Path checkException(Path path) {
try {
logger.info("path.toString() {}", path.toString());
return path;
} catch (UncheckedIOException exception) {
logger.error("YEAH");
return null;
}
}
How, if at all, can I catch an UncheckedIOException in my code giving out a Stream<Path>, so that consumers of the path do not encounter this exception?
As an example, the following code should never encounter the exception:
List<Path> paths = getPaths().collect(toList());
Right now, the exception is triggered by code invoking collect (and I could catch the exception there):
java.io.UncheckedIOException: java.nio.file.FileSystemLoopException: /tmp/junit5844257414812733938/selfloop
at java.nio.file.FileTreeIterator.fetchNextIfNeeded(FileTreeIterator.java:88)
at java.nio.file.FileTreeIterator.hasNext(FileTreeIterator.java:104)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at ...
EDIT: I provided a simple JUnit test class. In this question I ask you to fix the test by just modifying the code in provideStream.
package somewhere;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.TemporaryFolder;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import static java.nio.file.FileVisitOption.FOLLOW_LINKS;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.is;
import static org.hamcrest.Matchers.nullValue;
import static org.hamcrest.core.IsNot.not;
import static org.junit.Assert.fail;
public class StreamTest {
#Rule
public TemporaryFolder temporaryFolder = new TemporaryFolder();
#Test
public void test() throws Exception {
Path rootPath = Paths.get(temporaryFolder.getRoot().getPath());
createSelfloop();
Stream<Path> stream = provideStream(rootPath);
assertThat(stream.collect(Collectors.toList()), is(not(nullValue())));
}
private Stream<Path> provideStream(Path rootPath) throws IOException {
return Files.walk(rootPath, FOLLOW_LINKS).sorted();
}
private void createSelfloop() throws IOException {
String root = temporaryFolder.getRoot().getPath();
try {
Path symlink = Paths.get(root, "selfloop");
Path target = Paths.get(root);
Files.createSymbolicLink(symlink, target);
} catch (UnsupportedOperationException x) {
// Some file systems do not support symbolic links
fail();
}
}
}
You can make your own walking stream factory:
public class FileTree {
public static Stream<Path> walk(Path p) {
Stream<Path> s=Stream.of(p);
if(Files.isDirectory(p)) try {
DirectoryStream<Path> ds = Files.newDirectoryStream(p);
s=Stream.concat(s, StreamSupport.stream(ds.spliterator(), false)
.flatMap(FileTree::walk)
.onClose(()->{ try { ds.close(); } catch(IOException ex) {} }));
} catch(IOException ex) {}
return s;
}
// in case you don’t want to ignore exceprions silently
public static Stream<Path> walk(Path p, BiConsumer<Path,IOException> handler) {
Stream<Path> s=Stream.of(p);
if(Files.isDirectory(p)) try {
DirectoryStream<Path> ds = Files.newDirectoryStream(p);
s=Stream.concat(s, StreamSupport.stream(ds.spliterator(), false)
.flatMap(sub -> walk(sub, handler))
.onClose(()->{ try { ds.close(); }
catch(IOException ex) { handler.accept(p, ex); } }));
} catch(IOException ex) { handler.accept(p, ex); }
return s;
}
// and with depth limit
public static Stream<Path> walk(
Path p, int maxDepth, BiConsumer<Path,IOException> handler) {
Stream<Path> s=Stream.of(p);
if(maxDepth>0 && Files.isDirectory(p)) try {
DirectoryStream<Path> ds = Files.newDirectoryStream(p);
s=Stream.concat(s, StreamSupport.stream(ds.spliterator(), false)
.flatMap(sub -> walk(sub, maxDepth-1, handler))
.onClose(()->{ try { ds.close(); }
catch(IOException ex) { handler.accept(p, ex); } }));
} catch(IOException ex) { handler.accept(p, ex); }
return s;
}
}
I am building a collection program in Java that collects data from websites using their apis. I am encountering this problem where it will hang on an http call. I tried to work around it by executing an http call over an executor service with a timeout. That doesn't seem to work as it would keep timing out and retrying. I figured it might be something to do with the API so after a retry I would reinitialize a whole new object per website API. Still no solution. I am trying to identify the root cause of this but can't seem to put my finger on it.
Here is a look at my flickr manager class that handles the calls to flickr.
import java.net.SocketException;
import java.net.UnknownHostException;
import java.util.Collection;
import java.util.Collections;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.scribe.exceptions.OAuthConnectionException;
import com.flickr4java.flickr.Flickr;
import com.flickr4java.flickr.FlickrException;
import com.flickr4java.flickr.FlickrRuntimeException;
import com.flickr4java.flickr.REST;
import com.flickr4java.flickr.RequestContext;
import com.flickr4java.flickr.auth.Auth;
import com.flickr4java.flickr.auth.Permission;
import com.flickr4java.flickr.people.User;
import com.flickr4java.flickr.photos.Exif;
import com.flickr4java.flickr.photos.Extras;
import com.flickr4java.flickr.photos.Photo;
import com.flickr4java.flickr.photos.PhotoList;
import com.flickr4java.flickr.photos.SearchParameters;
import com.flickr4java.flickr.photos.Size;
import com.google.common.util.concurrent.RateLimiter;
public class FlickrManager {
private final static Logger LOG = Logger.getLogger(FlickrManager.class.getName());
private final static ExecutorService executorService = Executors.newSingleThreadExecutor();
private Flickr flickr;
private final int MAX_PER_PAGE = 500;
private final RateLimiter rateLimiter;
private String ApiKey;
private String ApiSecret;
private String authToken;
private String authTokenSecret;
private Integer hostPort;
private String hostAddress;
private String httpScheme;
public FlickrManager(Flickr flickr, double apiCallsPerSecond) throws FlickrException {
this.flickr = flickr;
flickr.getTestInterface().echo(Collections.emptyMap());
//get flickr info to reinitialize flickr object if necessary
this.ApiKey = flickr.getApiKey();
this.ApiSecret = flickr.getSharedSecret();
this.hostPort = flickr.getTransport().getPort();
this.hostAddress = flickr.getTransport().getHost();
this.httpScheme = flickr.getTransport().getScheme();
if(flickr.getAuth() != null){
this.authToken = flickr.getAuth().getToken();
this.authTokenSecret = flickr.getAuth().getTokenSecret();
}
this.rateLimiter = RateLimiter.create(apiCallsPerSecond);
}
private void initialize(){
this.flickr = null;
REST rest = new REST(this.hostAddress,this.hostPort);
rest.setScheme(this.httpScheme);
this.flickr = new Flickr(this.ApiKey, this.ApiSecret,rest);
if(this.authToken != null && this.authTokenSecret != null){
RequestContext requestContext = RequestContext.getRequestContext();
Auth auth = new Auth();
auth.setPermission(Permission.READ);
auth.setToken(this.authToken);
auth.setTokenSecret(this.authTokenSecret);
requestContext.setAuth(auth);
flickr.setAuth(auth);
}
}
public User getUserInfo(String flickrProfileId) throws FlickrException{
return doFlickrAction(new CallableFlickrTask<User>(){
#Override
public User execute() throws FlickrException {
return flickr.getPeopleInterface().getInfo(flickrProfileId);
}
});
}
public PhotoList<Photo> search(SearchParameters params, int page) throws FlickrException{
return doFlickrAction(new CallableFlickrTask<PhotoList<Photo>>(){
#Override
public PhotoList<Photo> execute() throws FlickrException {
return flickr.getPhotosInterface().search(params, MAX_PER_PAGE, page);
}
});
}
public PhotoList<Photo> getUserPhotos(String userNSID, int page) throws FlickrException{
return doFlickrAction(new CallableFlickrTask<PhotoList<Photo>>(){
#Override
public PhotoList<Photo> execute() throws FlickrException {
return flickr.getPeopleInterface().getPhotos(
userNSID,
null, null, null, null, null,
Flickr.CONTENTTYPE_PHOTO, null,
Extras.ALL_EXTRAS, 100, page);
}
});
}
//Catch the execption inside the function for failure to get exif
public Collection<Exif> getPhotoExif(Photo photo) throws FlickrException, FlickrRuntimeException {
return doFlickrAction(new CallableFlickrTask<Collection<Exif>>(){
#Override
public Collection<Exif> execute() throws FlickrException {
return flickr.getPhotosInterface().getExif(photo.getId(),photo.getSecret());
}
});
}
public Collection<Size> getAvailablePhotoSizes(Photo photo) throws FlickrException{
return doFlickrAction(new CallableFlickrTask<Collection<Size>>(){
#Override
public Collection<Size> execute() throws FlickrException {
return flickr.getPhotosInterface().getSizes(photo.getId());
}
});
}
private abstract class CallableFlickrTask<T> {
public abstract T execute() throws FlickrException, FlickrRuntimeException;
}
private <T> T doFlickrAction(CallableFlickrTask<T> callable) throws FlickrException {
while(true){
rateLimiter.acquire();
Future<T> future = executorService.submit(new Callable<T>() {
#Override
public T call() throws Exception {
return callable.execute();
}});
try {
return future.get(5, TimeUnit.MINUTES);
} catch (InterruptedException e) {
LOG.log(Level.INFO,"Interrupted exception: {0}",e.getMessage());
initialize(); //initialize if it's been interupted
} catch (ExecutionException e) {
Throwable cause = e.getCause();
if( cause instanceof UnknownHostException ||
cause instanceof SocketException ||
cause instanceof OAuthConnectionException ){
//sleep and retry
LOG.log(Level.INFO,"Unknown Host or Socket exception. Retry: {0}",e.getMessage());
try {
Thread.sleep(10000);
initialize();
} catch (InterruptedException ex) {
LOG.log(Level.INFO, "Thread sleep was interrupted exception: {0}", ex.getMessage());
}
}
//if it's not of the above exceptions, then rethrow
else if (cause instanceof FlickrException) {
throw (FlickrException) cause;
}
else {
throw new IllegalStateException(e);
}
} catch (TimeoutException e) {
LOG.log(Level.INFO,"Timeout Exception: {0}",e.getMessage());
initialize(); //initialize again after timeout
}
}
}
}
I also used jvisualvm to get a look at what the collection is doing while it's hanging. The thread dump is here: Thread dump