I wrote some code using documents4j library to convert some documents from .docx to .pdf.
I followed the examples in the documentation and the convertion works perfectly using MS-Word, but I notice that after all conversions complete and methods return, the java application result still running and it seems not to exit.
If I explicitly close the converter using execute() and shutDown() methods instead of schedule(), the application exit, but I need this application run in concurrent mode, so I can't explicitly invoke shutDown() otherwise I cause MS-Word exits and breaks some still opened documents.
What is the best way to use the converter to achieve these objectives? Has LocalConverter got a method to check if there is a queue of documents to be converted? With this information I could invoke shutDown() only with an empty queue and instantiate a new LocalConverter on the next converting request.
Thanks in advance for your replies!
Dan
I am the maintainer of documents4j.
You are right, the LocalConverter does not currently await termination of the running conversions when it is shut down. I added a grace period that corresponds to the timeout for running conversions to finish which will be included in the next version of documents4j. I will release a new version once I have looked into a pending issue with escaping path in folders containing spaces.
In the mean time, I recommend you to implement something similar yourself. Every conversion emitts a Future. Simply collect all the futures in a Set and then call get on each future in a thread. If all futures have returned (i.e. all conversions are complete), it is safe to shut down the local converter:
IConverter converter = ...;
Set<Future<?>> futures = new HashSet<>();
for ( ... ) {
futures.add(converter.from(...).to(...).schedule());
}
for (Future<?> future : futures) {
future.get();
}
converter.shutDown();
The above is safe because all conversions are done concurrently but the main thread blocks until all futures have completed. Future::get blocks until its conversion has completed but returns immediately if a conversion already is complete. This way you make sure you do not reach shutDown before all conversions are complete.
Related
How is Apache NIO HttpAsyncClient able to wait for a remote response without blocking any thread? Does it have a way to setup a callback with the OS (I doubt so?). Otherwise does it perform some sort of polling?
EDIT - THIS ANSWER IS WRONG. PLEASE IGNORE AS IT IS INCORRECT.
You did not specify a version, so I can not point you to source code. But to answer your question, the way that Apache does it is by returning a Future<T>.
Take a look at this link -- https://hc.apache.org/httpcomponents-asyncclient-4.1.x/current/httpasyncclient/apidocs/org/apache/http/nio/client/HttpAsyncClient.html
Notice how the link says nio in the package. That stands for "non-blocking IO". And 9 times out of 10, that is done by doing some work with a new thread.
This operates almost exactly like a CompletableFuture<T> from your first question. Long story short, the library kicks off the process in a new thread (just like CompletableFuture<T>), stores that thread into the Future<T>, then allows you to use that Future<T> to manage that newly created thread containing your non-blocking task. By doing this, you get to decide exactly when and where the code blocks, potentially giving you the chance to make some significant performance optimizations.
To be more explicit, let's give a pseudocode example. Let's say I have a method attached to an endpoint. Whenever the endpoint is hit, the method is executed. The method takes in a single parameter --- userID. I then use that userID to perform 2 operations --- fetch the user's personal info, and fetch the user's suggested content. I need both pieces, and neither request needs to wait for the other to finish before starting. So, what I do is something like the following.
public StoreFrontPage visitStorePage(int userID)
{
final Future<UserInfo> userInfoFuture = this.fetchUserInfo(userID);
final Future<PageSuggestion> recommendedContentFuture = this.fetchRecommendedContent(userId);
final UserInfo userInfo = userInfoFuture.get();
final PageSuggestion recommendedContent = recommendedContentFuture.get();
return new StoreFrontPage(userInfo, recommendedContent);
}
When I call this.fetchUserInfo(userID), my code creates a new thread, starts fetching user info on that new thread, but let's my main thread continue and kick off this.fetchRecommendedContent(userID) in the meantime. The 2 fetches are occurring in parallel.
However, I need both results in order to create my StoreFrontPage. So, when I decided that I cannot continue any further until I have the results from both fetches, I call Future::get on each of my fetches. What this method does is merge the new thread back into my original one. In short, it says "wait for that one thread you created to finish doing what it was doing, then output the result as a return value".
And to more explicitly answer your question, no, this tool does not require you to do anything involving callbacks or polling. All it does is give you a Future<T> and lets you decide when you need to block the thread to wait on that Future<T> to finish.
EDIT - THIS ANSWER IS WRONG. PLEASE IGNORE AS IT IS INCORRECT.
I have never used a ForkJoinPool and I came accross this code snippet.
I have a Set<Document> docs. Document has a write method. If I do the following, do I need to have a get or join to ensure that all the docs in the set have correctly finished their write method?
ForkJoinPool pool = new ForkJoinPool(concurrencyLevel);
pool.submit(() -> docs.parallelStream().forEach(
doc -> {
doc.write();
})
);
What happens if one of the docs is unable to complete it's write? Say it throws an exception. Does the code given wait for all the docs to complete their write operation?
ForkJoinPool.submit(Runnable) returns a ForkJoinTask representing the pending completion of the task. If you want to wait for all documents to be processed, you need some form of synchronization with that task, like calling its get() method (from the Future interface).
Concerning the exception handling, as usual any exception during the stream processing will stop it. However you have to refer to the documentation of Stream.forEach(Consumer):
The behavior of this operation is explicitly nondeterministic. For parallel stream pipelines, this operation does not guarantee to respect the encounter order of the stream, as doing so would sacrifice the benefit of parallelism. For any given element, the action may be performed at whatever time and in whatever thread the library chooses. […]
This means that you have no guarantee of which document will be written if an exception occurs. The processing will stop but you cannot control which document will still be processed.
If you want to make sure that the remaining documents are processed, I would suggest 2 solutions:
surround the document.write() with a try/catch to make sure no exception propagates, but this makes it difficult to check which document succeeded or if there was any failure at all; or
use another solution to manage your parallel processing, like the CompletableFuture API. As noted in the comments, your current solution is a hack that works thanks to implementation details, so it would be preferable to do something cleaner.
Using CompletableFuture, you could do it as follows:
List<CompletableFuture<Void>> futures = docs.stream()
.map(doc -> CompletableFuture.runAsync(doc::write, pool))
.collect(Collectors.toList());
This will make sure that all documents are processed, and inspect each future in the returned list for success or failure.
The D documentation is a bit difficult to understand, how do I achieve the following Java code in D?
ExecutorService service = Executors.newFixedThreadPool(num_threads);
for (File f : files) {
service.execute(() -> process(f));
}
service.shutdown();
try {
service.awaitTermination(24, TimeUnit.HOURS);
} catch (InterruptedException e) {
e.printStackTrace();
}
Would I use std.parallelism or std.concurrency or is this functionality not available in the standard library.
The example you posted is best represented by std.parallelism. You can use the parallel helper function in there, which when used in a foreach it will automatically execute the body of the foreach loop in a thread pool with a thread number (worker size) of totalCPUs - 1. You can change this default value by setting defaultPoolThreads = x; before doing any parallel code (best done at the start of your main) or by using a custom taskPool.
basically then your code would translate to this:
foreach (f; files.parallel) {
process(f); // or just paste what should be done with f in here if it matters
}
std.parallelism is the high-level implementation of multithreading. If you want to just have a task pool you can create a new TaskPool() (with number of workers as optional argument) and then do the same as above using service.parallel(files).
Alternatively you could queue lots of tasks using
foreach (f; files) {
service.put!process(f);
}
service.finish(true); // true = blocking
// you could also do false here in a while true loop with sleeps to implement a timeout
which would then allow to implement a timeout.
Though I would recommend using parallel because it handles the code above for you + gives each thread a storage to access the local stack so you can use it just the same as a normal non-parallel foreach loop.
A side-note/explanation on the documentation:
The std.concurrency is also very useful, though not what you would use with your example. In it there is a spawn function which is spawning a new thread with the powerful messaging API. With the messaging API (send and receive) you can implement thread-safe value passing between threads without using sockets, files or other workarounds.
When you have a task (thread with messaging API) and call receive in it it will wait until the passed timeout is done or another thread calls the send function on the task. For example you could have a file loading queue task which always waits using receive and when e.g. the UI puts a file into the loading queue (just by calling send once or more) it can work on these files and send them back to the UI task which receives using a timeout in the main loop.
std.concurrency also has a FiberScheduler which can be used to do thread style programming in a single thread. For example if you have a UI which does drawing and input handling and all sorts of things it can then in the main loop on every tick call the FiberScheduler and all the currently running tasks will continue where they last stopped (by calling yield). This is useful when you have like an image generator which takes long to generate, but you don't want to block the UI for too long so you call yield() every iteration or so to halt the execution of the generator and do one step of the main loop.
When fibers aren't running they can even be passed around threads so you can have a thread pool from std.parallelism and a custom FiberScheduler implementation and do load balancing which could be useful in a web server for example.
If you want to create Fibers without a FiberScheduler and call them raw (and check their finish states and remove them from any custom scheduler implementation) you can inherit the Fiber class from core.thread, which works exactly the same as a Thread, you just need to call Fiber.yield() every time you wait or think you are in a CPU intensive section.
Though because most APIs aren't made for Fibers they will block and make Fibers seem kind of useless, so you definitely want to use some API which uses Fibers there. For example vibe.d has lots of fiber based functions, but a custom std.concurrency implementation so you need to look out for that.
But just to come back to your question, a TaskPool or in your particular case the parallel function is what you need.
https://dlang.org/phobos/std_parallelism.html#.parallel
https://dlang.org/phobos/std_parallelism.html#.TaskPool.parallel
i just started to learn programming (2 weeks ago), and i am trying to make a bot for a game. In the main class of the bot, there are 3 methods that needs to be returned within 2second, or it will return null. I want to avoid returning null and return what it has calculate during 2sec instead.
public ArrayList<PlaceArmiesMove> getPlaceArmiesMoves(BotState state, Long timeOut){
ArrayList<PlaceArmiesMove> placeArmiesMoves = new ArrayList<PlaceArmiesMove>();
// caculations filling the ArrayList
return placeArmiesMoves;
}
what i want to do is after 2 second, returning placeArmiesMoves, wether the method finished running or not. I have read about guava SimpleTimeLimiter and callWithTimeout() but i am totally lost about how to use it (i read something about multithreading but i just don't understand what this is)
i would be incredibly grateful if someone could help me! thanks
Given a function like getPlaceArmiesMove, there are several techniques you might use to bound its execution time.
Trust the function to keep track of time itself
If the function runs a loop, it can check on every iteration whether the time has expired.
long startTime = System.currentTimeMillis()
for (;;) {
// do some work
long elapsed = System.currentTimeMillis() - startTime;
if (elapsed >= timeOut) {
break;
}
}
This technique is simple, but there is no guarantee it will complete before the timeout; it depends on the function and how granular you can make the work (of course, if it's too granular, you'll be spending more time testing if the timeout has expired than actually doing work).
Run the function in a thread, and ask it to stop
I'm not familiar with Guava, but this seems to be what SimpleTimeLimiter is doing. In Java, it isn't generally possible to forcibly stop a thread, though it is possible to ignore the thread after a timeout (the function will run to completion, but you've already used its partial result, and ignore the complete result that comes in too late). Guava says that it interrupts the thread if it has not returned before the timeout. This works only if your function is testing to see if it has been interrupted, much like the "trust your function" technique.
See this answer for an example on how to test if your thread has been interrupted. Note that some Java methods (like Thread.sleep) may throw InterruptedException if the thread is interrupted.
In the end, sprinkling checks for isInterrupted() all over your function won't be much different than sprinkling manual checks for the timeout. So running in a thread, you still must trust your function, but there may be nicer helpers available for that sort of thing (e.g. Guava).
Run the function in a separate process, and kill it
An example of how to do this is left as an exercise, but if you run your function in a separate process (or a thread in languages that support forcibly stopping threads, e.g. Erlang, Ruby, others), then you can use the operating system facilities to kill the process if it does not complete after a timeout.
Having that process return a partial result will be challenging. It could periodically send "work-in-progress" to the calling process over a pipe, or periodically save work to a file.
Use Java's Timer package , however this will require you to understand concepts such as threads and method overriding. Nevertheless, if this is what you require, the answer is quite similar to this question How to set a timer in java
We are using Elasticsearch 0.90.7 in our Scala Play Framework application, where the end of our "doSearch" method looks like:
def doSearch(...) = {
...
val actionRequessBuilder: ActionRequestBuilder // constructed earlier in the method
val executedFuture: ListenableActionFuture<Response> = actionRequestBuilder.execute
return executedFuture.actionGet
}
where ListenableActionFuture extends java.util.concurrent.Future, and ListenableActionFuture#actionGet is basically the same as Future#get
This all works fine when we execute searches sequentially, however when we try to execute multiple searches in parallel:
val search1 = scala.concurrent.Future(doSearch(...))
val search2 = scala.concurrent.Future(doSearch(...))
return Await.result(search1, defaultDuration) -> Await.result(search2, defaultDuration))
we're sometimes (less than 1 or 2% of the time) getting unexpected timeouts on our scala futures, even when using an extremely long timeout during qa (5 seconds, where a search always executes in less than 200ms). This also occurs when using the scala global execution context as well as when using the Play default execution context.
Is there some sort of unexpected interaction going on here as a result of having a java future wrapped in a scala future? I would have thought that the actionGet call on the java future at the end of doSearch would have prevented the two futures from interfering with each other, but evidently that may not be the case.
I thought it was established somewhere that blocking is evil. Evil!
In this case, Await.result will block the current thread, because it's waiting for a result.
Await wraps the call in blocking, in an attempt to notify the thread pool that it might want to grow some threads to maintain its desired parallelism and avoid deadlock.
If the current thread is not a Scala BlockContext, then you get mere blockage.
Whatever your precise configuration, presumably you're holding onto a thread while blocked, and the thunk you're running for search wants to run something and can't because the pool is exhausted.
What's relevant is what pool produced the current Thread: whether the go-between Future is on a different pool doesn't matter if, at bottom, you need to use more threads from the current pool and it is exhausted.
Of course, that's just a guess.
It makes more sense to have a single future that gets the value from both searches, with a timeout.
But if you wind up with multiple Futures, it makes sense to use Future.sequence and wait on that.