We are using Elasticsearch 0.90.7 in our Scala Play Framework application, where the end of our "doSearch" method looks like:
def doSearch(...) = {
...
val actionRequessBuilder: ActionRequestBuilder // constructed earlier in the method
val executedFuture: ListenableActionFuture<Response> = actionRequestBuilder.execute
return executedFuture.actionGet
}
where ListenableActionFuture extends java.util.concurrent.Future, and ListenableActionFuture#actionGet is basically the same as Future#get
This all works fine when we execute searches sequentially, however when we try to execute multiple searches in parallel:
val search1 = scala.concurrent.Future(doSearch(...))
val search2 = scala.concurrent.Future(doSearch(...))
return Await.result(search1, defaultDuration) -> Await.result(search2, defaultDuration))
we're sometimes (less than 1 or 2% of the time) getting unexpected timeouts on our scala futures, even when using an extremely long timeout during qa (5 seconds, where a search always executes in less than 200ms). This also occurs when using the scala global execution context as well as when using the Play default execution context.
Is there some sort of unexpected interaction going on here as a result of having a java future wrapped in a scala future? I would have thought that the actionGet call on the java future at the end of doSearch would have prevented the two futures from interfering with each other, but evidently that may not be the case.
I thought it was established somewhere that blocking is evil. Evil!
In this case, Await.result will block the current thread, because it's waiting for a result.
Await wraps the call in blocking, in an attempt to notify the thread pool that it might want to grow some threads to maintain its desired parallelism and avoid deadlock.
If the current thread is not a Scala BlockContext, then you get mere blockage.
Whatever your precise configuration, presumably you're holding onto a thread while blocked, and the thunk you're running for search wants to run something and can't because the pool is exhausted.
What's relevant is what pool produced the current Thread: whether the go-between Future is on a different pool doesn't matter if, at bottom, you need to use more threads from the current pool and it is exhausted.
Of course, that's just a guess.
It makes more sense to have a single future that gets the value from both searches, with a timeout.
But if you wind up with multiple Futures, it makes sense to use Future.sequence and wait on that.
Related
How is Apache NIO HttpAsyncClient able to wait for a remote response without blocking any thread? Does it have a way to setup a callback with the OS (I doubt so?). Otherwise does it perform some sort of polling?
EDIT - THIS ANSWER IS WRONG. PLEASE IGNORE AS IT IS INCORRECT.
You did not specify a version, so I can not point you to source code. But to answer your question, the way that Apache does it is by returning a Future<T>.
Take a look at this link -- https://hc.apache.org/httpcomponents-asyncclient-4.1.x/current/httpasyncclient/apidocs/org/apache/http/nio/client/HttpAsyncClient.html
Notice how the link says nio in the package. That stands for "non-blocking IO". And 9 times out of 10, that is done by doing some work with a new thread.
This operates almost exactly like a CompletableFuture<T> from your first question. Long story short, the library kicks off the process in a new thread (just like CompletableFuture<T>), stores that thread into the Future<T>, then allows you to use that Future<T> to manage that newly created thread containing your non-blocking task. By doing this, you get to decide exactly when and where the code blocks, potentially giving you the chance to make some significant performance optimizations.
To be more explicit, let's give a pseudocode example. Let's say I have a method attached to an endpoint. Whenever the endpoint is hit, the method is executed. The method takes in a single parameter --- userID. I then use that userID to perform 2 operations --- fetch the user's personal info, and fetch the user's suggested content. I need both pieces, and neither request needs to wait for the other to finish before starting. So, what I do is something like the following.
public StoreFrontPage visitStorePage(int userID)
{
final Future<UserInfo> userInfoFuture = this.fetchUserInfo(userID);
final Future<PageSuggestion> recommendedContentFuture = this.fetchRecommendedContent(userId);
final UserInfo userInfo = userInfoFuture.get();
final PageSuggestion recommendedContent = recommendedContentFuture.get();
return new StoreFrontPage(userInfo, recommendedContent);
}
When I call this.fetchUserInfo(userID), my code creates a new thread, starts fetching user info on that new thread, but let's my main thread continue and kick off this.fetchRecommendedContent(userID) in the meantime. The 2 fetches are occurring in parallel.
However, I need both results in order to create my StoreFrontPage. So, when I decided that I cannot continue any further until I have the results from both fetches, I call Future::get on each of my fetches. What this method does is merge the new thread back into my original one. In short, it says "wait for that one thread you created to finish doing what it was doing, then output the result as a return value".
And to more explicitly answer your question, no, this tool does not require you to do anything involving callbacks or polling. All it does is give you a Future<T> and lets you decide when you need to block the thread to wait on that Future<T> to finish.
EDIT - THIS ANSWER IS WRONG. PLEASE IGNORE AS IT IS INCORRECT.
I am reading java 8 in action, chapter 11 (about CompletableFutures), and it got me thinking about my company's code base.
The java 8 in action book says that if you have code like I write down below, you will only use 4 CompletableFutures at a time(if you have a 4 core computer). That means that if you want to perform for example 10 operations asynchronously, you will first run the first 4 CompletableFutures, then the second 4, and then the 2 remaining ones, because the default ForkJoinPool.commonPool() only provides the number of threads equal to Runtime.getRuntime().availableProcessors().
In my company's code base, there are #Service classes called AsyncHelpers, that contain a method load(), that uses CompletableFutures to load information about a product asynchronously in separate chunks. I was wondering if they only use 4 threads at a time.
There are several such async helpers in my company's code base, for example there's one for product list page (PLP) and one for product details page(PDP). A product details page is a page dedicated to a specific product showing it's detailed characteristics, cross-sell products, similar products and many more things.
There was an architectural decision to load the details of the pdp page in chunks. The loading is supposed to happen asynchronously, and the current code uses CompletableFutures. Let's look at pseudocode:
static PdpDto load(String productId) {
CompletableFuture<Details> photoFuture =
CompletableFuture.supplyAsync(() -> loadPhotoDetails(productId));
CompletableFuture<Details> characteristicsFuture =
CompletableFuture.supplyAsync(() -> loadCharacteristics(productId));
CompletableFuture<Details> variations =
CompletableFuture.supplyAsync(() -> loadVariations(productId));
// ... many more futures
try {
return new PdpDto( // construct Dto that will combine all Details objects into one
photoFuture.get(),
characteristicsFuture.get(),
variations.get(),
// .. many more future.get()s
);
} catch (ExecutionException|InterruptedException e) {
return new PdpDto(); // something went wrong, return an empty DTO
}
}
As you can see, the code above uses no custom executors.
Does this mean that if that load method has 10 CompletableFutures and there are currently 2 people loading the PDP page, and we have 20 CompletableFutures to load in total, then all those 20 CompletableFutures won't be executed all at once, but only 4 at a time?
My colleague told me that each user will get 4 threads, but I think the JavaDoc quite clearly states this:
public static ForkJoinPool commonPool()
Returns the common pool instance. This pool is statically constructed; its run state is unaffected by attempts to shutdown() or shutdownNow(). However this pool and any ongoing processing are automatically terminated upon program System.exit(int). Any program that relies on asynchronous task processing to complete before program termination should invoke commonPool().awaitQuiescence, before exit.
Which means that there's only 1 pool with 4 threads for all users of our website.
Yes, but it’s worse than that...
The default size of the common pool is 1 less than the number of processors/cores (or 1 if there’s only 1 processor), so you’re actually processing 3 at a time, not 4.
But your biggest performance hit is with parallel streams (if you use them), because they use the common pool too. Streams are meant to be used for super fast processing, so you don’t want them to share their resources with heavy tasks.
If you have task that is designed to be asynch (ie take more than a few milliseconds) then you should create a pool to run them in. Such a pool can be statically created and reused by all calling threads, which avoids overhead of pool creation per use. You should also tune the pool size by stress testing your code to find the optimum size to maximise throughput and minimise response time.
In my company's code base, there are [...] classes [...] that contain a method load(), that uses CompletableFutures to load information [...]
So, are you saying that the load() method waits for I/O to complete?
If so, and if what #Bohemian says is true, then you should not be using the default thread pool.
#Bohemian says that the default pool has approximately the same number of threads as your host has CPUs. That's great if your application has a lot of compute bound tasks to perform in the background. But it's not so great if your application has a lot of threads that are waiting for replies from different network services. That's a whole different story.
I am not an expert in the subject, and I don't know how (apart from doing experiments) to find out what the best number of threads is, but whatever that number is, it's going to have little to do with how many CPUs your system has, and therefore, you should not be using the default pool for that purpose.
The D documentation is a bit difficult to understand, how do I achieve the following Java code in D?
ExecutorService service = Executors.newFixedThreadPool(num_threads);
for (File f : files) {
service.execute(() -> process(f));
}
service.shutdown();
try {
service.awaitTermination(24, TimeUnit.HOURS);
} catch (InterruptedException e) {
e.printStackTrace();
}
Would I use std.parallelism or std.concurrency or is this functionality not available in the standard library.
The example you posted is best represented by std.parallelism. You can use the parallel helper function in there, which when used in a foreach it will automatically execute the body of the foreach loop in a thread pool with a thread number (worker size) of totalCPUs - 1. You can change this default value by setting defaultPoolThreads = x; before doing any parallel code (best done at the start of your main) or by using a custom taskPool.
basically then your code would translate to this:
foreach (f; files.parallel) {
process(f); // or just paste what should be done with f in here if it matters
}
std.parallelism is the high-level implementation of multithreading. If you want to just have a task pool you can create a new TaskPool() (with number of workers as optional argument) and then do the same as above using service.parallel(files).
Alternatively you could queue lots of tasks using
foreach (f; files) {
service.put!process(f);
}
service.finish(true); // true = blocking
// you could also do false here in a while true loop with sleeps to implement a timeout
which would then allow to implement a timeout.
Though I would recommend using parallel because it handles the code above for you + gives each thread a storage to access the local stack so you can use it just the same as a normal non-parallel foreach loop.
A side-note/explanation on the documentation:
The std.concurrency is also very useful, though not what you would use with your example. In it there is a spawn function which is spawning a new thread with the powerful messaging API. With the messaging API (send and receive) you can implement thread-safe value passing between threads without using sockets, files or other workarounds.
When you have a task (thread with messaging API) and call receive in it it will wait until the passed timeout is done or another thread calls the send function on the task. For example you could have a file loading queue task which always waits using receive and when e.g. the UI puts a file into the loading queue (just by calling send once or more) it can work on these files and send them back to the UI task which receives using a timeout in the main loop.
std.concurrency also has a FiberScheduler which can be used to do thread style programming in a single thread. For example if you have a UI which does drawing and input handling and all sorts of things it can then in the main loop on every tick call the FiberScheduler and all the currently running tasks will continue where they last stopped (by calling yield). This is useful when you have like an image generator which takes long to generate, but you don't want to block the UI for too long so you call yield() every iteration or so to halt the execution of the generator and do one step of the main loop.
When fibers aren't running they can even be passed around threads so you can have a thread pool from std.parallelism and a custom FiberScheduler implementation and do load balancing which could be useful in a web server for example.
If you want to create Fibers without a FiberScheduler and call them raw (and check their finish states and remove them from any custom scheduler implementation) you can inherit the Fiber class from core.thread, which works exactly the same as a Thread, you just need to call Fiber.yield() every time you wait or think you are in a CPU intensive section.
Though because most APIs aren't made for Fibers they will block and make Fibers seem kind of useless, so you definitely want to use some API which uses Fibers there. For example vibe.d has lots of fiber based functions, but a custom std.concurrency implementation so you need to look out for that.
But just to come back to your question, a TaskPool or in your particular case the parallel function is what you need.
https://dlang.org/phobos/std_parallelism.html#.parallel
https://dlang.org/phobos/std_parallelism.html#.TaskPool.parallel
I am experimenting with a game mechanic in which players can run scripts on in-game computers. Script execution will be resource limited at a gameplay level to some amount of instructions per tick.
The following proof-of-concept demonstrates a basic level of sandboxing and throttling of arbitrary user code. It successfully runs ~250 instructions of poorly crafted 'user input' and then discards the coroutine. Unfortunately, the Java process never terminates. A little investigation in shows that the LuaThread created by LuaJ for the coroutine is hanging around forever.
SandboxTest.java:
public static void main(String[] args) {
Globals globals = JsePlatform.debugGlobals();
LuaValue chunk = globals.loadfile("res/test.lua");
chunk.call();
}
res/test.lua:
function sandbox(fn)
-- read script and set the environment
f = loadfile(fn, "t")
debug.setupvalue(f, 1, {print = print})
-- create a coroutine and have it yield every 50 instructions
local co = coroutine.create(f)
debug.sethook(co, coroutine.yield, "", 50)
-- demonstrate stepped execution, 5 'ticks'
for i = 1, 5 do
print("tick")
coroutine.resume(co)
end
end
sandbox("res/badfile.lua")
res/badfile.lua:
while 1 do
print("", "badfile")
end
The docs suggest that a coroutine that is considered unresumable will be garbage collected and an OrphanedThread exception will be thrown, signalling the LuaThread to end - but this is never happening. My question is in two parts:
Am I doing something fundamentally wrong to cause this behaviour?
If not, how should I handle this situation? From the source it appears that if I can get a reference to the LuaThread in Java I may be able to forcibly abandon it by issuing an interrupt(). Is this a good idea?
Reference: Lua / Java / LuaJ - Handling or Interrupting Infinite Loops and Threads
EDIT: I have posted a bug report over at the LuaJ SourceForge. It discusses the underlying issue (threads not being garbage collected as in the Lua spec) and suggests some ways to work around it.
It seems to be a limitation of LuaJ. I submitted a ticket earlier this year on Sourceforge as I see you've also done. The LuaThread class doesn't store references to the Java threads it creates, so you can't interrupt() those threads without modifying the LuaJ core to expose them:
new Thread(this, "Coroutine-"+(++coroutine_count)).start();
It may be dangerous to interrupt those threads without adding appropriate cleanup code to LuaJ.
Documentation that you provided for OrphanedThread also tells us that scope is the defining condition:
"Error sublcass that indicates a lua thread that is no longer referenced has been detected. The java thread in which this is thrown should correspond to a LuaThread being used as a coroutine that could not possibly be resumed again because there are no more references to the LuaThread with which it is associated. Rather than locking up resources forever, this error is thrown, and should fall through all the way to the thread's Thread.run() method."
Your code example doesn't cause all LuaThread references to disappear, so you shouldn't expect an exception to be thrown. CoroutineLib documentation indicates: Coroutines that are yielded but never resumed to complete their execution may not be collected by the garbage collector, so an OutOfMemoryError should actually be expected from the code you listed on SourceForge, if I'm not mistaken. LuaThread:52 also specifies: Applications should not catch OrphanedThread, because it can break the thread safety of luaj., which is yet another obstacle.
There also seem to be differences between empty and non-empty while loops in Lua/J. IIRC, empty loops (while true do end) don't obey all coroutine hook/tick rules. *Because no actions occur in an empty loop, there's no opportunity for certain hooks to occur (I need to test this again so please correct me otherwise!).
A forked version of LuaJ with the functionality we're looking for is used in the ComputerCraft mod for Minecraft, though it's designed only for the mod and isn't open source.
i just started to learn programming (2 weeks ago), and i am trying to make a bot for a game. In the main class of the bot, there are 3 methods that needs to be returned within 2second, or it will return null. I want to avoid returning null and return what it has calculate during 2sec instead.
public ArrayList<PlaceArmiesMove> getPlaceArmiesMoves(BotState state, Long timeOut){
ArrayList<PlaceArmiesMove> placeArmiesMoves = new ArrayList<PlaceArmiesMove>();
// caculations filling the ArrayList
return placeArmiesMoves;
}
what i want to do is after 2 second, returning placeArmiesMoves, wether the method finished running or not. I have read about guava SimpleTimeLimiter and callWithTimeout() but i am totally lost about how to use it (i read something about multithreading but i just don't understand what this is)
i would be incredibly grateful if someone could help me! thanks
Given a function like getPlaceArmiesMove, there are several techniques you might use to bound its execution time.
Trust the function to keep track of time itself
If the function runs a loop, it can check on every iteration whether the time has expired.
long startTime = System.currentTimeMillis()
for (;;) {
// do some work
long elapsed = System.currentTimeMillis() - startTime;
if (elapsed >= timeOut) {
break;
}
}
This technique is simple, but there is no guarantee it will complete before the timeout; it depends on the function and how granular you can make the work (of course, if it's too granular, you'll be spending more time testing if the timeout has expired than actually doing work).
Run the function in a thread, and ask it to stop
I'm not familiar with Guava, but this seems to be what SimpleTimeLimiter is doing. In Java, it isn't generally possible to forcibly stop a thread, though it is possible to ignore the thread after a timeout (the function will run to completion, but you've already used its partial result, and ignore the complete result that comes in too late). Guava says that it interrupts the thread if it has not returned before the timeout. This works only if your function is testing to see if it has been interrupted, much like the "trust your function" technique.
See this answer for an example on how to test if your thread has been interrupted. Note that some Java methods (like Thread.sleep) may throw InterruptedException if the thread is interrupted.
In the end, sprinkling checks for isInterrupted() all over your function won't be much different than sprinkling manual checks for the timeout. So running in a thread, you still must trust your function, but there may be nicer helpers available for that sort of thing (e.g. Guava).
Run the function in a separate process, and kill it
An example of how to do this is left as an exercise, but if you run your function in a separate process (or a thread in languages that support forcibly stopping threads, e.g. Erlang, Ruby, others), then you can use the operating system facilities to kill the process if it does not complete after a timeout.
Having that process return a partial result will be challenging. It could periodically send "work-in-progress" to the calling process over a pipe, or periodically save work to a file.
Use Java's Timer package , however this will require you to understand concepts such as threads and method overriding. Nevertheless, if this is what you require, the answer is quite similar to this question How to set a timer in java