So a little background;
I am working on a project in which a servlet is going to release crawlers upon a lot of text files within a file system. I was thinking of dividing the load under multiple threads, for example:
a crawler enters a directory, finds 3 files and 6 directories. it will start processing the files and start a thread with a new crawler for the other directories. So from my creator class I would create a single crawler upon a base directory. The crawler would assess the workload and if deemed needed it would spawn another crawler under another thread.
My crawler class looks like this
package com.fujitsu.spider;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.Serializable;
import java.util.ArrayList;
public class DocumentSpider implements Runnable, Serializable {
private static final long serialVersionUID = 8401649393078703808L;
private Spidermode currentMode = null;
private String URL = null;
private String[] terms = null;
private float score = 0;
private ArrayList<SpiderDataPair> resultList = null;
public enum Spidermode {
FILE, DIRECTORY
}
public DocumentSpider(String resourceURL, Spidermode mode, ArrayList<SpiderDataPair> resultList) {
currentMode = mode;
setURL(resourceURL);
this.setResultList(resultList);
}
#Override
public void run() {
try {
if (currentMode == Spidermode.FILE) {
doCrawlFile();
} else {
doCrawlDirectory();
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println("SPIDER # " + URL + " HAS FINISHED.");
}
public Spidermode getCurrentMode() {
return currentMode;
}
public void setCurrentMode(Spidermode currentMode) {
this.currentMode = currentMode;
}
public String getURL() {
return URL;
}
public void setURL(String uRL) {
URL = uRL;
}
public void doCrawlFile() throws Exception {
File target = new File(URL);
if (target.isDirectory()) {
throw new Exception(
"This URL points to a directory while the spider is in FILE mode. Please change this spider to FILE mode.");
}
procesFile(target);
}
public void doCrawlDirectory() throws Exception {
File baseDir = new File(URL);
if (!baseDir.isDirectory()) {
throw new Exception(
"This URL points to a FILE while the spider is in DIRECTORY mode. Please change this spider to DIRECTORY mode.");
}
File[] directoryContent = baseDir.listFiles();
for (File f : directoryContent) {
if (f.isDirectory()) {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.DIRECTORY, this.resultList);
spider.terms = this.terms;
(new Thread(spider)).start();
} else {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.FILE, this.resultList);
spider.terms = this.terms;
(new Thread(spider)).start();
}
}
}
public void procesDirectory(String target) throws IOException {
File base = new File(target);
File[] directoryContent = base.listFiles();
for (File f : directoryContent) {
if (f.isDirectory()) {
procesDirectory(f.getPath());
} else {
procesFile(f);
}
}
}
public void procesFile(File target) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(target));
String line;
while ((line = br.readLine()) != null) {
String[] words = line.split(" ");
for (String currentWord : words) {
for (String a : terms) {
if (a.toLowerCase().equalsIgnoreCase(currentWord)) {
score += 1f;
}
;
if (currentWord.toLowerCase().contains(a)) {
score += 1f;
}
;
}
}
}
br.close();
resultList.add(new SpiderDataPair(this, URL));
}
public String[] getTerms() {
return terms;
}
public void setTerms(String[] terms) {
this.terms = terms;
}
public float getScore() {
return score;
}
public void setScore(float score) {
this.score = score;
}
public ArrayList<SpiderDataPair> getResultList() {
return resultList;
}
public void setResultList(ArrayList<SpiderDataPair> resultList) {
this.resultList = resultList;
}
}
The problem I am facing is that in my root crawler I have this list of results from every crawler that I want to process further. The operation to process the data from this list is called from the servlet (or main method for this example). However the operations is always called before all of the crawlers have completed their processing. thus launching the operation to process the results too soon, which leads to incomplete data.
I tried solving this using the join methods but unfortunately I cant seems to figure this one out.
package com.fujitsu.spider;
import java.util.ArrayList;
import com.fujitsu.spider.DocumentSpider.Spidermode;
public class Main {
public static void main(String[] args) throws InterruptedException {
ArrayList<SpiderDataPair> results = new ArrayList<SpiderDataPair>();
String [] terms = {"SERVER","CHANGE","MO"};
DocumentSpider spider1 = new DocumentSpider("C:\\Users\\Mark\\workspace\\Spider\\Files", Spidermode.DIRECTORY, results);
spider1.setTerms(terms);
DocumentSpider spider2 = new DocumentSpider("C:\\Users\\Mark\\workspace\\Spider\\File2", Spidermode.DIRECTORY, results);
spider2.setTerms(terms);
Thread t1 = new Thread(spider1);
Thread t2 = new Thread(spider2);
t1.start();
t1.join();
t2.start();
t2.join();
for(SpiderDataPair d : spider1.getResultList()){
System.out.println("PATH -> " + d.getFile() + " SCORE -> " + d.getSpider().getScore());
}
for(SpiderDataPair d : spider2.getResultList()){
System.out.println("PATH -> " + d.getFile() + " SCORE -> " + d.getSpider().getScore());
}
}
}
TL:DR
I really wish to understand this subject so any help would be immensely appreciated!.
You need a couple of changes in your code:
In the spider:
List<Thread> threads = new LinkedList<Thread>();
for (File f : directoryContent) {
if (f.isDirectory()) {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.DIRECTORY, this.resultList);
spider.terms = this.terms;
Thread thread = new Thread(spider);
threads.add(thread)
thread.start();
} else {
DocumentSpider spider = new DocumentSpider(f.getPath(), Spidermode.FILE, this.resultList);
spider.terms = this.terms;
Thread thread = new Thread(spider);
threads.add(thread)
thread.start();
}
}
for (Thread thread: threads) thread.join()
The idea is to create a new thread for each spider and start it. Once they are all running, you wait until each on is done before the Spider itself finishes. This way each spider thread keeps running until all of its work is done (thus the top thread runs until all children and their children are finished).
You also need to change your runner so that it runs the two spiders in parallel instead of one after another like this:
Thread t1 = new Thread(spider1);
Thread t2 = new Thread(spider2);
t1.start();
t2.start();
t1.join();
t2.join();
You should use a higher-level library than bare Thread for this task. I would suggest looking into ExecutorService in particular and all of java.util.concurrent generally. There are abstractions there that can manage all of the threading issues while providing well-formed tasks a properly protected environment in which to run.
For your specific problem, I would recommend some sort of blocking queue of tasks and a standard producer-consumer architecture. Each task knows how to determine if its path is a file or directory. If it is a file, process the file; if it is a directory, crawl the directory's immediate contents and enqueue new tasks for each sub-path. You could also use some properly-synchronized shared state to cap the number of files processed, depth, etc. Also, the service provides the ability to await termination of its tasks, making the "join" simpler.
With this architecture, you decouple the notion of threads and thread management (handled by the ExecutorService) with your business logic of tasks (typically a Runnable or Callable). The service itself has the ability to tune how to instantiate, such as a fixed maximum number of threads or a scalable number depending on how many concurrent tasks exist (See factory methods on java.util.concurrent.Executors). Threads, which are more expensive than the Runnables they execute, are re-used to conserve resources.
If your objective is primarily something functional that works in production quality, then the library is the way to go. However, if your objective is to understand the lower-level details of thread management, then you may want to investigate the use of latches and perhaps thread groups to manage them at a lower level, exposing the details of the implementation so you can work with the details.
Related
I can monitor a directory by registering cwith a WatchKey (there are tons of examples on the web) however this watcher catches every single event. E.g. On windows If am monitoring the d:/temp dir and I create a new .txt file and rename it I get the following events.
ENTRY_CREATE: d:\temp\test\New Text Document.txt
ENTRY_MODIFY: d:\temp\test
ENTRY_DELETE: d:\temp\test\New Text Document.txt
ENTRY_CREATE: d:\temp\test\test.txt
ENTRY_MODIFY: d:\temp\test
I want to perform an action when the new file is created or updated. However I don't want the action to run 5 times in the above example.
My 1st Idea: As I only need to run the action (in this case a push to a private Git server) once every now an then (e.g. check every 10 seconds only if there are changes to the monitored directory and only then perform the push) I thought of having an object with a boolean parameter that I can get and set from within separate threads.
Now this works kinda ok (unless the gurus can help educated me as to why this is a terrible idea) The problem is that if a file event is caught during the SendToGit thread's operation and this operation completes it sets the "Found" parameter to false. Immediately thereafter one of the other events are caught (as in the example above) they will set the "Found" parameter to true again. This is not ideal as I will then run the SendToGit operation immediately again which will be unnecessary.
My 2nd Idea Investigate pausing the check for changes in the MonitorFolder thread until the SendToGit operation is complete (I.e. Keep checking if the ChangesFound Found parameter has been set back to false. When this parameter is false start checking for changes again.
Questions
Is this an acceptable way to go or have I gone down a rabbit hole with no hope of return?
If I go down the road of my 2nd idea what happens if I am busy with the SendToGit operation and a change is made in the monitoring folder? I suspect that this will not be identified and I may miss changes.
The Rest of the code
ChangesFound.java
package com.acme;
public class ChangesFound {
private boolean found = false;
public boolean wereFound() {
return this.found;
}
public void setFound(boolean commitToGit) {
this.found = commitToGit;
}
}
In my main app I start 2 threads.
MonitorFolder.java Monitors the directory and when Watcher events are found set the ChangesFound variable "found" to true.
SendToGit.java Every 10 seconds checks if the ChangesFound variable found is true and if it is performs the push. (or in this case just prints a message)
Here is my App that starts the threads:
package com.acme;
import java.io.IOException;
import java.nio.file.Path;
import java.nio.file.Paths;
public class App {
private static ChangesFound chg;
public static void main(String[] args) throws IOException {
String dirToMonitor = "D:/Temp";
boolean recursive = true;
chg = new ChangesFound();
Runnable r = new SendToGit(chg);
new Thread(r).start();
Path dir = Paths.get(dirToMonitor);
Runnable m = new MonitorFolder(chg, dir, recursive);
new Thread(m).start();
}
}
SendToGit.java
package com.acme;
public class SendToGit implements Runnable {
private ChangesFound changes;
public SendToGit(ChangesFound chg) {
changes = chg;
}
public void run() {
while (true) {
try {
Thread.sleep(10000);
} catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
System.out.println(java.time.LocalDateTime.now() + " [SendToGit] waking up.");
if (changes.wereFound()) {
System.out.println("\t***** CHANGES FOUND push to Git.");
changes.setFound(false);
} else {
System.out.println("\t***** Nothing changed.");
}
}
}
}
MonitorFolder.java (Apologies for the long class I only added this here in case it helps someone else.)
package com.acme;
import java.io.IOException;
import java.nio.file.FileSystems;
import java.nio.file.FileVisitResult;
import java.nio.file.Files;
import static java.nio.file.LinkOption.NOFOLLOW_LINKS;
import java.nio.file.Path;
import java.nio.file.SimpleFileVisitor;
import static java.nio.file.StandardWatchEventKinds.ENTRY_CREATE;
import static java.nio.file.StandardWatchEventKinds.ENTRY_DELETE;
import static java.nio.file.StandardWatchEventKinds.ENTRY_MODIFY;
import static java.nio.file.StandardWatchEventKinds.OVERFLOW;
import java.nio.file.WatchEvent;
import java.nio.file.WatchKey;
import java.nio.file.WatchService;
import java.nio.file.attribute.BasicFileAttributes;
import java.util.HashMap;
import java.util.Map;
public class MonitorFolder implements Runnable {
private static WatchService watcher;
private static Map<WatchKey, Path> keys;
private static boolean recursive;
private static boolean trace = false;
private static boolean commitGit = false;
private static ChangesFound changes;
#SuppressWarnings("unchecked")
static <T> WatchEvent<T> cast(WatchEvent<?> event) {
return (WatchEvent<T>) event;
}
/**
* Creates a WatchService and registers the given directory
*/
MonitorFolder(ChangesFound chg, Path dir, boolean rec) throws IOException {
changes = chg;
watcher = FileSystems.getDefault().newWatchService();
keys = new HashMap<WatchKey, Path>();
recursive = rec;
if (recursive) {
System.out.format("[MonitorFolder] Scanning %s ...\n", dir);
registerAll(dir);
System.out.println("Done.");
} else {
register(dir);
}
// enable trace after initial registration
this.trace = true;
}
/**
* Register the given directory with the WatchService
*/
private static void register(Path dir) throws IOException {
WatchKey key = dir.register(watcher, ENTRY_CREATE, ENTRY_DELETE, ENTRY_MODIFY);
if (trace) {
Path prev = keys.get(key);
if (prev == null) {
System.out.format("register: %s\n", dir);
} else {
if (!dir.equals(prev)) {
System.out.format("update: %s -> %s\n", prev, dir);
}
}
}
keys.put(key, dir);
}
/**
* Register the given directory, and all its sub-directories, with the
* WatchService.
*/
private static void registerAll(final Path start) throws IOException {
// register directory and sub-directories
Files.walkFileTree(start, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
throws IOException {
register(dir);
return FileVisitResult.CONTINUE;
}
});
}
/**
* Process all events for keys queued to the watcher
*/
public void run() {
for (;;) {
// wait for key to be signalled
WatchKey key;
try {
key = watcher.take();
} catch (InterruptedException x) {
return;
}
Path dir = keys.get(key);
if (dir == null) {
System.err.println("WatchKey not recognized!!");
continue;
}
for (WatchEvent<?> event : key.pollEvents()) {
WatchEvent.Kind kind = event.kind();
// TBD - provide example of how OVERFLOW event is handled
if (kind == OVERFLOW) {
System.out.println("Something about Overflow");
continue;
}
// Context for directory entry event is the file name of entry
WatchEvent<Path> ev = cast(event);
Path name = ev.context();
Path child = dir.resolve(name);
// print out event and set ChangesFound object Found parameter to True
System.out.format("[MonitorFolder] " + java.time.LocalDateTime.now() + " - %s: %s\n", event.kind().name(), child);
changes.setFound(true);
// if directory is created, and watching recursively, then
// register it and its sub-directories
if (recursive && (kind == ENTRY_CREATE)) {
try {
if (Files.isDirectory(child, NOFOLLOW_LINKS)) {
registerAll(child);
}
} catch (IOException x) {
// ignore to keep sample readbale
}
}
}
// reset key and remove from set if directory no longer accessible
boolean valid = key.reset();
if (!valid) {
keys.remove(key);
// all directories are inaccessible
if (keys.isEmpty()) {
System.out.println("keys.isEmpty");
break;
}
}
}
}
}
Both of your strategies will lead to issues because the Watch Service is very verbose and sends many messages when maybe one or two is actually needed to your downstream handling - so sometimes you may do unnecessary work or miss events.
When using WatchService you could collate multiple notifications together and pass on as ONE event listing a sets of recent deletes, creates and updates:
DELETE followed by CREATE => sent as UPDATE
CREATE followed by MODIFY => sent as CREATE
CREATE or MODIFY followed by DELETE => sent as DELETE
Instead of calling WatchService.take() and acting on each message, use WatchService.poll(timeout) and only when nothing is returned act on the union of preceeding set of events as one - not individually after each successful poll.
It is easier to decouple the problems as two components so that you don't repeat the WatchService code the next time you need it:
A watch manager which handles watch service + dir registrations and collates the duplicates to send to event listeners as ONE group
A Listener class to receive the group of changes and act on the set.
This example may help illustrate - see WatchExample which is the manager which sets up the registrations BUT passes on much fewer events to the callback defined by setListener. You could set up MonitorFolder like WatchExample to reduce the events discovered, and make your code in SendToGit as a Listener which is called on demand with the aggregated set of fileChange(deletes, creates, updates).
public static void main(String[] args) throws IOException, InterruptedException {
final List<Path> dirs = Arrays.stream(args).map(Path::of).map(Path::toAbsolutePath).collect(Collectors.toList());
Kind<?> [] kinds = { StandardWatchEventKinds.ENTRY_CREATE, StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_DELETE};
// Should launch WatchExample PER Filesystem:
WatchExample w = new WatchExample();
w.setListener(WatchExample::fireEvents);
for(Path dir : dirs)
w.register(kinds, dir);
// For 2 or more WatchExample use: new Thread(w[n]::run).start();
w.run();
}
public class WatchExample implements Runnable {
private final Set<Path> created = new LinkedHashSet<>();
private final Set<Path> updated = new LinkedHashSet<>();
private final Set<Path> deleted = new LinkedHashSet<>();
private volatile boolean appIsRunning = true;
// Decide how sensitive the polling is:
private final int pollmillis = 100;
private WatchService ws;
private Listener listener = WatchExample::fireEvents;
#FunctionalInterface
interface Listener
{
public void fileChange(Set<Path> deleted, Set<Path> created, Set<Path> modified);
}
WatchExample() {
}
public void setListener(Listener listener) {
this.listener = listener;
}
public void shutdown() {
System.out.println("shutdown()");
this.appIsRunning = false;
}
public void run() {
System.out.println();
System.out.println("run() START watch");
System.out.println();
try(WatchService autoclose = ws) {
while(appIsRunning) {
boolean hasPending = created.size() + updated.size() + deleted.size() > 0;
System.out.println((hasPending ? "ws.poll("+pollmillis+")" : "ws.take()")+" as hasPending="+hasPending);
// Use poll if last cycle has some events, as take() may block
WatchKey wk = hasPending ? ws.poll(pollmillis,TimeUnit.MILLISECONDS) : ws.take();
if (wk != null) {
for (WatchEvent<?> event : wk.pollEvents()) {
Path parent = (Path) wk.watchable();
Path eventPath = (Path) event.context();
storeEvent(event.kind(), parent.resolve(eventPath));
}
boolean valid = wk.reset();
if (!valid) {
System.out.println("Check the path, dir may be deleted "+wk);
}
}
System.out.println("PENDING: cre="+created.size()+" mod="+updated.size()+" del="+deleted.size());
// This only sends new notifications when there was NO event this cycle:
if (wk == null && hasPending) {
listener.fileChange(deleted, created, updated);
deleted.clear();
created.clear();
updated.clear();
}
}
}
catch (InterruptedException e) {
System.out.println("Watch was interrupted, sending final updates");
fireEvents(deleted, created, updated);
}
catch (IOException e) {
throw new UncheckedIOException(e);
}
System.out.println("run() END watch");
}
public void register(Kind<?> [] kinds, Path dir) throws IOException {
System.out.println("register watch for "+dir);
// If dirs are from different filesystems WatchService will give errors later
if (this.ws == null) {
ws = dir.getFileSystem().newWatchService();
}
dir.register(ws, kinds);
}
/**
* Save event for later processing by event kind EXCEPT for:
* <li>DELETE followed by CREATE => store as MODIFY
* <li>CREATE followed by MODIFY => store as CREATE
* <li>CREATE or MODIFY followed by DELETE => store as DELETE
*/
private void
storeEvent(Kind<?> kind, Path path) {
System.out.println("STORE "+kind+" path:"+path);
boolean cre = false;
boolean mod = false;
boolean del = kind == StandardWatchEventKinds.ENTRY_DELETE;
if (kind == StandardWatchEventKinds.ENTRY_CREATE) {
mod = deleted.contains(path);
cre = !mod;
}
else if (kind == StandardWatchEventKinds.ENTRY_MODIFY) {
cre = created.contains(path);
mod = !cre;
}
addOrRemove(created, cre, path);
addOrRemove(updated, mod, path);
addOrRemove(deleted, del, path);
}
// Add or remove from the set:
private static void addOrRemove(Set<Path> set, boolean add, Path path) {
if (add) set.add(path);
else set.remove(path);
}
public static void fireEvents(Set<Path> deleted, Set<Path> created, Set<Path> modified) {
System.out.println();
System.out.println("fireEvents START");
for (Path path : deleted)
System.out.println(" DELETED: "+path);
for (Path path : created)
System.out.println(" CREATED: "+path);
for (Path path : modified)
System.out.println(" UPDATED: "+path);
System.out.println("fireEvents END");
System.out.println();
}
}
I am monitoring several (about 15) paths for incoming files using the Apache Commons FileAlterationMonitor. These incoming files can come in batches of anywhere between 1 and 500 files at a time. I have everything set up and the application monitors the folders as expected, I have it set to poll the folders every minute. My issue is that, as expected, the listener that I have set up alerts for each incoming file when all I really need, and want, is to know when a new batch of files come in. So I would like to receive a single alert as opposed to up to 500 at a time.
Does anyone have any ideas for how to control the number of alerts or only pick up the first or last notification or something to that effect? I would like to stick with the FileAlterationMonitor if at all possible because it will be running for long periods and so far from what I can tell in testing is that it doesn't seem to put a heavy load on the system or slow the rest of the application down. But I am definitely open to other ideas if what I'm looking for isn't possible with the FileAlterationMonitor.
public class FileMonitor{
private final String newDirectory;
private FileAlterationMonitor monitor;
private final Alerts gui;
private final String provider;
public FileMonitor (String d, Alerts g, String pro) throws Exception{
newDirectory = d;
gui = g;
provider = pro;
}
public void startMonitor() throws Exception{
// Directory to monitor
final File directory = new File(newDirectory);
// create new observer
FileAlterationObserver fao = new FileAlterationObserver(directory);
// add listener to observer
fao.addListener(new FileAlterationListenerImpl(gui, provider));
// wait 1 minute between folder polls.
monitor = new FileAlterationMonitor(60000);
monitor.addObserver(fao);
monitor.start();
}
}
public class FileAlterationListenerImpl implements FileAlterationListener{
private final Alerts gui;
private final String provider;
private final LogFiles monitorLogs;
public FileAlterationListenerImpl(Alerts g, String pro){
gui = g;
provider = pro;
monitorLogs = new LogFiles();
}
#Override
public void onStart(final FileAlterationObserver observer){
System.out.println("The FileListener has started on: " + observer.getDirectory().getAbsolutePath());
}
#Override
public void onDirectoryCreate(File file) {
}
#Override
public void onDirectoryChange(File file) {
}
#Override
public void onDirectoryDelete(File file) {
}
#Override
public void onFileCreate(File file) {
try{
switch (provider){
case "Spectrum": gui.alertsAreaAppend("New/Updated schedules available for Spectrum zones!\r\n");
monitorLogs.appendNewLogging("New/Updated schedules available for Spectrum zones!\r\n");
break;
case "DirecTV ZTA": gui.alertsAreaAppend("New/Updated schedules available for DirecTV ZTA zones!\r\n");
monitorLogs.appendNewLogging("New/Updated schedules available for DirecTV ZTA zones!\r\n");
break;
case "DirecTV RSN": gui.alertsAreaAppend("New/Updated schedules available for DirecTV RSN zones!\r\n");
monitorLogs.appendNewLogging("New/Updated schedules available for DirecTV RSN zones!\r\n");
break;
case "Suddenlink": gui.alertsAreaAppend("New/Updated schedules available for Suddenlink zones!\r\n");
monitorLogs.appendNewLogging("New/Updated schedules available for Suddenlink zones!\r\n");
break;
}
}catch (IOException e){}
}
#Override
public void onFileChange(File file) {
}
Above is the FileMonitor class and overridden FileAlterationListener I have so far.
Any suggestions would be greatly appreciated.
Here's a quick and crude implementation:
public class FileAlterationListenerAlterThrottler {
private static final int DEFAULT_THRESHOLD_MS = 5000;
private final int thresholdMs;
private final Map<String, Long> providerLastFileProcessedAt = new HashMap<>();
public FileAlterationListenerAlterThrottler() {
this(DEFAULT_THRESHOLD_MS);
}
public FileAlterationListenerAlterThrottler(int thresholdMs) {
this.thresholdMs = thresholdMs;
}
public synchronized boolean shouldAlertFor(String provider) {
long now = System.currentTimeMillis();
long last = providerLastFileProcessedAt.computeIfAbsent(provider, x -> 0l);
if (now - last < thresholdMs) {
return false;
}
providerLastFileProcessedAt.put(provider, now);
return true;
}
}
And a quicker and cruder driver:
public class Test {
public static void main(String[] args) throws Exception {
int myThreshold = 1000;
FileAlterationListenerAlterThrottler throttler = new FileAlterationListenerAlterThrottler(myThreshold);
for (int i = 0; i < 3; i++) {
doIt(throttler);
}
Thread.sleep(1500);
doIt(throttler);
}
private static void doIt(FileAlterationListenerAlterThrottler throttler) {
boolean shouldAlert = throttler.shouldAlertFor("Some Provider");
System.out.println("Time now: " + System.currentTimeMillis());
System.out.println("Should alert? " + shouldAlert);
System.out.println();
}
}
Yields:
Time now: 1553739126557
Should alert? true
Time now: 1553739126557
Should alert? false
Time now: 1553739126557
Should alert? false
Time now: 1553739128058
Should alert? true
I'm quite new with Java (studied on University but was version 2).
Now I've developed an application that downloads files from s3 in parallel. I've used ExecutorService and Runnable to download multiple files in parallel in this way:
public class DownloaderController {
private AmazonS3 s3Client;
private ExecutorService fixedPool;
private TransferManager dlManager;
private List<MultipleFileDownload> downloads = new ArrayList<>();
public DownloaderController() {
checkForNewWork();
}
public void checkForNewWork(){
Provider1 provider = new Provider1();
fixedPool = Executors.newFixedThreadPool(4);
List<Download> providedDownloadList = provider.toBeDownloaded();
for (Download temp : providedDownloadList) {
if (!downloadData.contains(temp)) {
fixedPool.submit(download.downloadCompletedHandler(s3Client));
}
}
}
}
public void printToTextArea(String msg){
Date now = new Date();
if ( !DateUtils.isSameDay(this.lastLogged, now)){
this._doLogRotate();
}
this.lastLogged = now;
SimpleDateFormat ft = new SimpleDateFormat("dd/MM/yyyy H:mm:ss");
String output = "[ " + ft.format(now) + " ] " + msg + System.getProperty("line.separator");
Platform.runLater(() -> {
//this is a FXML object
statusTextArea.appendText(output);
});
}
}
public class Provider1 implements downloadProvider {
}
public class Download {
abstract Runnable downloadCompletedHandler(AmazonS3 s3Client);
}
public class DownloadProvider1 extends Download {
#Override
public Runnable downloadCompletedHandler(AmazonS3 s3Client){
Runnable downloadwork = () -> {
ObjectListing list = s3Client.listObjects(this.bucket,this.getFolder());
List<S3ObjectSummary> objects = list.getObjectSummaries();
AtomicLong workSize = new AtomicLong(0);
List<DeleteObjectsRequest.KeyVersion> keys = new ArrayList<>();
objects.forEach(obj -> {
workSize.getAndAdd(obj.getSize());
keys.add((new DeleteObjectsRequest.KeyVersion(obj.getKey())));
});
MultipleFileDownload fileDownload = dlManager.downloadDirectory("myBucket","folder","outputDirectory");
try {
fileDownload.waitForCompletion();
} catch (Exception e){
printToTextArea("Exception while download from AmazonS3");
}
};
return downloadwork;
}
}
In the downloadController i call every minute a function that adds some Download objects to a List that contains folders that has to be downloaded from s3. when a new Download is added it's also added to ExecutorService pool. The Download object returns the code that has to be executed to download the folder from s3 and what to do when it's download is finished.
My problem is, what is the best way to communicate between the Runnable and the DownloadController ?
Your code does not make entirely clear what the goal is. From what I understand, I would have done it something like this:
public class Download {
private AmazonS3 s3Client;
public Download(AmazonS2 client) { s3Client = client; }
public void run() { // perform download }
}
That class does nothing but download the file (cfg Separation of Concern) and is a Runnable. You can do executorService.submit(new Download(client)) and the download will be finished eventually; also, you can test it without being called concurrently.
Now, you want a callback method for logging it being finished.
public class LoggingCallback {
public void log() {
System.out.println("finished");
}
}
Also a Runnable (the method doesn't have to be run()).
And, to make sure it's triggered one after the other, maybe
class OneAfterTheOther {
private Runnable first;
private Runnable second;
public OneAfterTheOther(Runnable r1, Runnable r2) {
first = r1; second = r2;
}
public void run() { first.run(); second.run(); }
}
which if submitted like this
Download dl = new Download(client);
Logger l = new LoggingCallback();
executorService.submit(new OneAfterTheOther(dl::run, l::log));
will do what I think you're trying to do.
i have joined to one of those Vertx lovers , how ever the single threaded main frame may not be working for me , because in my server there might be 50 file download requests at a moment , as a work around i have created this class
public abstract T onRun() throws Exception;
public abstract void onSuccess(T result);
public abstract void onException();
private static final int poolSize = Runtime.getRuntime().availableProcessors();
private static final long maxExecuteTime = 120000;
private static WorkerExecutor mExecutor;
private static final String BG_THREAD_TAG = "BG_THREAD";
protected RoutingContext ctx;
private boolean isThreadInBackground(){
return Thread.currentThread().getName() != null && Thread.currentThread().getName().equals(BG_THREAD_TAG);
}
//on success will not be called if exception be thrown
public BackgroundExecutor(RoutingContext ctx){
this.ctx = ctx;
if(mExecutor == null){
mExecutor = MyVertxServer.vertx.createSharedWorkerExecutor("my-worker-pool",poolSize,maxExecuteTime);
}
if(!isThreadInBackground()){
/** we are unlocking the lock before res.succeeded , because it might take long and keeps any thread waiting */
mExecutor.executeBlocking(future -> {
try{
Thread.currentThread().setName(BG_THREAD_TAG);
T result = onRun();
future.complete(result);
}catch (Exception e) {
GUI.display(e);
e.printStackTrace();
onException();
future.fail(e);
}
/** false here means they should not be parallel , and will run without order multiple times on same context*/
},false, res -> {
if(res.succeeded()){
onSuccess((T)res.result());
}
});
}else{
GUI.display("AVOIDED DUPLICATE BACKGROUND THREADING");
System.out.println("AVOIDED DUPLICATE BACKGROUND THREADING");
try{
T result = onRun();
onSuccess((T)result);
}catch (Exception e) {
GUI.display(e);
e.printStackTrace();
onException();
}
}
}
allowing the handlers to extend it and use it like this
public abstract class DefaultFileHandler implements MyHttpHandler{
public abstract File getFile(String suffix);
#Override
public void Handle(RoutingContext ctx, VertxUtils utils, String suffix) {
new BackgroundExecutor<Void>(ctx) {
#Override
public Void onRun() throws Exception {
File file = getFile(URLDecoder.decode(suffix, "UTF-8"));
if(file == null || !file.exists()){
utils.sendResponseAndEnd(ctx.response(),404);
return null;
}else{
utils.sendFile(ctx, file);
}
return null;
}
#Override
public void onSuccess(Void result) {}
#Override
public void onException() {
utils.sendResponseAndEnd(ctx.response(),404);
}
};
}
and here is how i initialize my vertx server
vertx.deployVerticle(MainDeployment.class.getCanonicalName(),res -> {
if (res.succeeded()) {
GUI.display("Deployed");
} else {
res.cause().printStackTrace();
}
});
server.requestHandler(router::accept).listen(port);
and here is my MainDeployment class
public class MainDeployment extends AbstractVerticle{
#Override
public void start() throws Exception {
// Different ways of deploying verticles
// Deploy a verticle and don't wait for it to start
for(Entry<String, MyHttpHandler> entry : MyVertxServer.map.entrySet()){
MyVertxServer.router.route(entry.getKey()).handler(new Handler<RoutingContext>() {
#Override
public void handle(RoutingContext ctx) {
String[] handlerID = ctx.request().uri().split(ctx.currentRoute().getPath());
String suffix = handlerID.length > 1 ? handlerID[1] : null;
entry.getValue().Handle(ctx, new VertxUtils(), suffix);
}
});
}
}
}
this is working just fine when and where i need it , but i still wonder if is there any better way to handle concurencies like this on vertx , if so an example would be really appreciated . thanks alot
I don't fully understand your problem and reasons for your solution. Why don't you implement one verticle to handle your http uploads and deploy it multiple times? I think that handling 50 concurrent uploads should be a piece of cake for vert.x.
When deploying a verticle using a verticle name, you can specify the number of verticle instances that you want to deploy:
DeploymentOptions options = new DeploymentOptions().setInstances(16);
vertx.deployVerticle("com.mycompany.MyOrderProcessorVerticle", options);
This is useful for scaling easily across multiple cores. For example you might have a web-server verticle to deploy and multiple cores on your machine, so you want to deploy multiple instances to take utilise all the cores.
http://vertx.io/docs/vertx-core/java/#_specifying_number_of_verticle_instances
vertx is a well-designed model so that a concurrency issue does not occur.
generally, vertx does not recommend the multi-thread model.
(because, handling is not easy.)
If you select multi-thread model, you have to think about shared data..
Simply, if you just only want to split EventLoop Area,
first of all, you make sure Check your a number of CPU Cores.
and then Set up the count of Instances .
DeploymentOptions options = new DeploymentOptions().setInstances(4);
vertx.deployVerticle("com.mycompany.MyOrderProcessorVerticle", options);
But, If you have 4cores of CPU, you don't set up over 4 instances.
If you set up to number four or more, the performance won't improve.
vertx concurrency reference
http://vertx.io/docs/vertx-core/java/
I am trying to run two different instances of a Java thread I created, and I see it create the two threads, but only one thread is being called and used at a time. If I run them separately they work fine, but when I try to run the two threads at the same time only one is being updated.
Here is the pertinent code from a main() method:
for( Properties prop: propList){
Send send = new MainTest().new Send(iterationOffset,prop);
send.start();
}
public class Send extends Thread{
private double iterationOffset;
private Properties prop;
SendEsd(double off, Properties p){
this.iterationOffset = off;
this.prop = p;
}
#Override
public void run() {
while (true) {
String id = prop.get("platform.id").toString();
System.out.println("$$$$$$$$$$$$$$ create Thred : send = " +id);
sendData( iterationOffset, prop );
}
}
private void sendData(double iterationOffset, Properties prop ){
id = prop.get("platform.id").toString();
// *** the itDataList is a really large CSV it inputs,
// so it will spin here a long time
for (it itData : itDataList) {
try {
//*** it will send some data here
Thread.sleep(1000);
} catch (InterruptedException e) {
log.error("Sleep thread was interrupted.");
}
}
}
}