How to use ObjectPool in detached thread? - java

The principal of object polling is very interesting
To me it can't be strong without the multi-threading execution.
For exemple i try this library furious-objectpool
The debugging show that the create/passivate methods are executed in the same request thread, how could i take advantage of this principal using it in another thread?

Object Pools are rather discouraged in Java. They are quite an expensive concept, usually way more expensive than just creating an object (new operator requires ~10 instructions, acquire/release in pools typically need MUCH more).
Also such long lived objects in Java tend to mess with GC not being able to clean up resources.
I would really encourage you to use some DI container with some nice stateless beans. It is both super fast (usually only 1 object per type) and nicely managable.
However, if you really need to use a pool, make sure that you use it for an object that has a very expensive construction process - typically some sort of network connections (database connections are the most common example).
As for another thread stuff: such pools are (or what is the point anyways?) always thread safe. Typical usage scenario would involve some sort of a server (like REST service) that accepts and executes plenty of user requests per minute.
Edit:
And please - don't use a technology/library just because it looks cool. It more often than not will bring you trouble in the long run.

Related

Project loom: what makes the performance better when using virtual threads?

To give some context here, I have been following Project Loom for some time now. I have read The state of Loom. I have done asynchronous programming.
Asynchronous programming (provided by Java NIO) returns the thread to the thread pool when the task waits and it goes to great lengths to not block threads. And this gives a large performance gain, we can now handle many more request as they are not directly bound by the number of OS threads. But what we lose here, is the context. The same task is now NOT associated with just one thread. All the context is lost once we dissociate tasks from threads. Exception traces do not provide very useful information and debugging is difficult.
In comes Project Loom with virtual threads that become the single unit of concurrency. And now you can perform a single task on a single virtual thread.
It's all fine until now, but the article goes on to state, with Project Loom:
A simple, synchronous web server will be able to handle many more requests without requiring more hardware.
I don't understand how we get performance benefits with Project Loom over asynchronous APIs? The asynchrounous API:s make sure to not keep any thread idle. So, what does Project Loom do to make it more efficient and performant that asynchronous API:s?
EDIT
Let me re-phrase the question. Let's say we have an http server that takes in requests and does some crud operations with a backing persistent database. Say, this http server handles a lot of requests - 100K RPM. Two ways of implementing this:
The HTTP server has a dedicated pool of threads. When a request comes in, a thread carries the task up until it reaches the DB, wherein the task has to wait for the response from DB. At this point, the thread is returned to the thread pool and goes on to do the other tasks. When DB responds, it is again handled by some thread from the thread pool and it returns an HTTP response.
The HTTP server just spawns virtual threads for every request. If there is an IO, the virtual thread just waits for the task to complete. And then returns the HTTP Response. Basically, there is no pooling business going on for the virtual threads.
Given that the hardware and the throughput remain the same, would any one solution fare better than the other in terms of response times or handling more throughput?
My guess is that there would not be any difference w.r.t performance.
We don't get benefit over asynchronous API. What we potentially will get is performance similar to asynchronous, but with synchronous code.
The answer by #talex puts it crisply. Adding further to it.
Loom is more about a native concurrency abstraction, which additionally helps one write asynchronous code. Given its a VM level abstraction, rather than just code level (like what we have been doing till now with CompletableFuture etc), It lets one implement asynchronous behavior but with reduce boiler plate.
With Loom, a more powerful abstraction is the savior. We have seen this repeatedly on how abstraction with syntactic sugar, makes one effectively write programs. Whether it was FunctionalInterfaces in JDK8, for-comprehensions in Scala.
With loom, there isn't a need to chain multiple CompletableFuture's (to save on resources). But one can write the code synchronously. And with each blocking operation encountered (ReentrantLock, i/o, JDBC calls), the virtual-thread gets parked. And because these are light-weight threads, the context switch is way-cheaper, distinguishing itself from kernel-threads.
When blocked, the actual carrier-thread (that was running the run-body of the virtual thread), gets engaged for executing some other virtual-thread's run. So effectively, the carrier-thread is not sitting idle but executing some other work. And comes back to continue the execution of the original virtual-thread whenever unparked. Just like how a thread-pool would work. But here, you have a single carrier-thread in a way executing the body of multiple virtual-threads, switching from one to another when blocked.
We get the same behavior (and hence performance) as manually written asynchronous code, but instead avoiding the boiler-plate to do the same thing.
Consider the case of a web-framework, where there is a separate thread-pool to handle i/o and the other for execution of http requests. For simple HTTP requests, one might serve the request from the http-pool thread itself. But if there are any blocking (or) high CPU operations, we let this activity happen on a separate thread asynchronously.
This thread would collect the information from an incoming request, spawn a CompletableFuture, and chain it with a pipeline (read from database as one stage, followed by computation from it, followed by another stage to write back to database case, web service calls etc). Each one is a stage, and the resultant CompletablFuture is returned back to the web-framework.
When the resultant future is complete, the web-framework uses the results to be relayed back to the client. This is how Play-Framework and others, have been dealing with it. Providing an isolation between the http thread handling pool, and the execution of each request. But if we dive deeper in this, why is it that we do this?
One core reason is to use the resources effectively. Particularly blocking calls. And hence we chain with thenApply etc so that no thread is blocked on any activity, and we do more with less number of threads.
This works great, but quite verbose. And debugging is indeed painful, and if one of the intermediary stages results with an exception, the control-flow goes hay-wire, resulting in further code to handle it.
With Loom, we write synchronous code, and let someone else decide what to do when blocked. Rather than sleep and do nothing.
The http server has a dedicated pool of threads ....
How big of a pool? (Number of CPUs)*N + C? N>1 one can fall back to anti-scaling, as lock contention extends latency; where as N=1 can under-utilize available bandwidth. There is a good analysis here.
The http server just spawns...
That would be a very naive implementation of this concept. A more realistic one would strive for collecting from a dynamic pool which kept one real thread for every blocked system call + one for every real CPU. At least that is what the folks behind Go came up with.
The crux is to keep the {handlers, callbacks, completions, virtual threads, goroutines : all PEAs in a pod} from fighting over internal resources; thus they do not lean on system based blocking mechanisms until absolutely necessary This falls under the banner of lock avoidance, and might be accomplished with various queuing strategies (see libdispatch), etc.. Note that this leaves the PEA divorced from the underlying system thread, because they are internally multiplexed between them. This is your concern about divorcing the concepts. In practice, you pass around your favourite languages abstraction of a context pointer.
As 1 indicates, there are tangible results that can be directly linked to this approach; and a few intangibles. Locking is easy -- you just make one big lock around your transactions and you are good to go. That doesn't scale; but fine-grained locking is hard. Hard to get working, hard to choose the fineness of the grain. When to use { locks, CVs, semaphores, barriers, ... } are obvious in textbook examples; a little less so in deeply nested logic. Lock avoidance makes that, for the most part, go away, and be limited to contended leaf components like malloc().
I maintain some skepticism, as the research typically shows a poorly scaled system, which is transformed into a lock avoidance model, then shown to be better. I have yet to see one which unleashes some experienced developers to analyze the synchronization behavior of the system, transform it for scalability, then measure the result. But, even if that were a win experienced developers are a rare(ish) and expensive commodity; the heart of scalability is really financial.

Java Concurrency: Alternative to Multi Threading (working with non thread safe environment)

I am working with a 3rd party proprietary library (no source code) which creates instances of a non thread safe component. Does this mean that I shouldn't use multiple threads to run jobs in parallel? Running each job in it's own JVM crossed my mind but is overkill.
Then I read the article here
http://cscarioni.blogspot.com/2011/09/alternatives-to-threading-in-java-stm.html
Is it recommended to follow that article's advice? What other alternatives exist out there?
Response to Martin James:
Vendor tells me that there is only one thread in which multiple instances of the component exist (Factory pattern to create the component instance) and each instance is independently controllable from it's API.
So does this mean that I can still use multiple threads while controlling each component instances running in one big thread?
No, it does not mean this.
It means that you should care about data protection yourself. One possible way is to synchronize access to that library in code that calls it (your code). Other possible way is using immutable objects (for example make private copy of non-threadsafe data structure every time you want to work with it).
Other way is to design your application that way that the code that works with certain object always run in the same thread. It does not mean that code that is working with other object (even of the same class) cannot run int other thread. So, the system is multi-threaded but no data clashes are created.
'Vendor tells me that there is only one thread in which multiple instances of the componenet exist (Factory pattern to create the component instance) and each instance is independently controllable from it's API.'
That is not exactly 100% clear. What I think it means is:
1) Creation of components is not thread-safe. Maybe they are all stored internally in a non-threadsafe container. Presumably, destruction of the components is not thread-safe either.
2) Once created, the components are 'independently controllable' - this suggests strongly that they are thread-safe.
That's my take on it so far. Maybe your vendor could confirm it, just to be sure, before you proceed any further with a design.
It all depends on what your code is actually doing with the components. For example, ArrayList is not thread safe, but Vector is thread safe. However, if you use an ArrayList inside a thread in a way that is thread safe or thread neutral, it doesn't matter. For example, you can use ArrayLists without any issue in a JavaEE container for web services because each web service call is going to be on its own thread and no one in their right mind would have web service handling threads communicating with each other. In fact, Vectors are very bad in a JavaEE container if you can avoid using them because they're synchronized on most of their methods, which means the container's threads will block until any operation is done.
As AlexR said, you can synchronize things, but the best approach is to really look at your code and figure out if the threads are actually going to be sharing data and state or going off and doing their own thing.

Sending objects back and forth between threads in java?

I have multiple client handler threads, these threads need to pass received object to a server queue and the sever queue will pass another type of object back to the sending thread. The server queue is started and keeps running when the server starts.I am not sure which thread mechanism to use for the client handler threads notified an object is sent back. I don't intend to use socket or writing to a file.
If you wanted to do actual message passing take a look at SynchronusQueue. Each thread will have reference to the queue and would wait until one thread passed the reference through the queue.
This would be thread safe and address your requirements.
Though if you are simply looking to have threads read and write a shared variable you can use normalocity's suggestion though it's thread-safety depends on how you access it (via sychronized or volatile)
As far as making objects accessible in Java, there's no difference between multi-thread and single-thread. You just follow the scope rules (public, private, protected), and that's it. Multiple threads all run within the same process, so there isn't any special thread-only scope rules to know about.
For example, define a method where you pass the object in, and make that method accessible from the other thread. The object you want to pass around simply needs to be accessible from the other thread's scope.
As far as thread-safety, you can synchronize your writes, and for the most part, that will take care of things. Thread safety can get a bit hairy the more complicated your code, but I think this will get you started.
One method for processing objects, and producing result objects is to have a shared array or LinkedList that acts as a queue of objects, containing the objects to be processed, and the resulting objects from that processing. It's hard to go into much more detail than that without more specifics on what exactly you're trying to do, but most shared access to objects between threads comes down to either inter-thread method calls, or some shared collection/queue of objects.
Unless you are absolutely certain that it will always be only a single object at a time, use some sort of Queue.
If you are certain that it will always be only a single object at a time, use some sort of Queue anyway. :-)
Use a concurrent queue from the java.util.concurrent.*.
why? Almost guaranteed to provide better general performance than any thing hand rolled.
recommendation: use a bound queue and you will get back-pressure for free.
note: the depth of queue determines your general latency characteristics: shallower queues will have lower latencies at the cost of reduced bandwidth.
Use Future semantics
why? Futures provide a proven and standard means of getting asynchronous result.
recommendation: create a simple Request class and expose a method #getFutureResponse(). The implementation of this method can use a variety of signaling strategies, such as Lock, flag (using Atomic/CAS), etc.
note: use of timeout semantics in Future will allow you to link server behavior to your server SLA e.g. #getFutureResponse(sla_timeout_ms).
A book tip for if you want to dive a bit more into communication between threads (or processes, or systems): Pattern-Oriented Software Architecture Volume 2: Patterns for Concurrent and Networked Objects
Just use simple dependency injection.
MyFirstThread extends Thread{
public void setData(Object o){...}
}
MySecondThread extends Thread{
MyFirstThread callback;
MySecondThread(MyFirstThread callback){this.callback=callback)
}
MyFirstThread t1 = new MyFirstThread();
MySecondThread t2 = new MySecondThread(t1);
t1.start();
t2.start();
You can now do callback.setData(...) in your second thread.
I find this to be the safest way. Other solutions involve using volatile or some kind of shared object which I think is an overkill.
You may also want to use BlockingQueue and pass both of those to each thread. If you plan to have more than one thread then it is probably a better solution.

How to manage executors

It's not infrequent in my practice that software I develop grows big and complex, and various parts of it use executors in their own way. From the performance point of view it would be better to use different thread pool configurations at each part. But from the maintainability and code-usability points it would be more preferable if all things related to threads, concurrency and CPU-utilization were kept and configured at some centralized place.
Having each class which needs some concurrent execution or scheduling create its own thread pool is not OK, because it is hard to control their life-cycles and overall number of threads.
Creating some kind of ExecutorManager and passing one thread pool around the application is not OK either, because, depending on type of the task and submitting rate, inappropriately configured combination of working queue and thread pool size can harm performance really bad.
So the question is: are there some common approaches that address this issue?
I would create 2 or 3 threadPools that can be configured differently depending on the tasks they execute, if there are more than 3 different concurrent actions you have a bigger problem.
The pools can be injected when needed (e.g. by name), additionally I would create an annotation to execute a defined method with a specific pool/executor using AOP (e.g. aspectj).
The annotation resolver should have access to all the pools/executors and submit the task using the one specified in the annotation.
For example:
#Concurrent ("pool1")
public void taskOfTypeOne() {
}
#Concurrent ("pool2")
public void taskOfTypeTwo() {
}
What you are looking for is Dependency Injection or Inversion of Control. One of the most popular DI frameworks for Java is Spring. You build ordinary Java objects, but with either specific annotations or by configuring them in XML, to wire them together. This way, you can configure your different ExecutorService instances in one place, and request that they be injected (possibly by name) in the client classes which need them.

Should volatile be used for attributes of domain model classes in Java web apps?

Here's my thinking:
Even though a HTTP request cycle is essentially handled by a 'single thread', each time a HTTP request is processed for that same session it is likely to be processed by a different thread from the thread pool.
Without the volatile keyword being used on a domain model object, whose lifecycle extends across multiple HTTP requests for the same session, then, according to my understanding, isn't it possible that the attribute could be thread local cached (an optimization by the compiler) in the thread that serviced the first HTTP request? If the second HTTP request is serviced by another thread then that second thread may not see the changes in that attribute that were made by the first thread.
Does this spell "Danger Will Robinson"? Or am I missing a vital plot point about the use (or not) of the volatile keyword?
I think you are forgetting that the threads handling the HTTP request first need to retrieve the instance of the domain model object from the HttpSession provided by your application server. The thread handling request 2 in the scenario you describe does not already have an instance of this domain model - it has to retrieve it from the session implementation at the start of handling each and every request.
I think it is completely reasonable to assume that the session-handling implementation in your application server is handling session data in such a way that memory model visibility issues are avoided. Apache Tomcat's default (non-clustered) HttpSession implementation, for example, stores the session attributes in a ConcurrentHashMap.
Adding volatile seems completely unnecessary to me. I have never seen this done for domain model objects handled by HTTP requests in a Servlet environment in any project I have worked in.
This would be a different story if thread-1 and thread-2 had references to the same object instance simulatenously while processing two different requests, and you were concerned about changes in one thread being visible to the other as each are processing the request, but this does not sound like what you are asking about.
Yes, if you are sharing an object between different threads, you may have race conditions. Without a happens before relationship, writes made by one thread may not be seen by a read in another thread.
Doing a volatile write in one thread and doing a volatile read of the same field in another thread establishes a happens before relationship between the two threads, and ensures visibility of the write.
This is a complicated problem, simply using a volatile keyword is probably not a good solution.
I think your understanding of it is correct. Given your description I would say it should be used. If its something more than a primitive type I would rather synchronize.
Good information on volatile:
http://www.javamex.com/tutorials/synchronization_volatile_when.shtml
If you have a mutable object in session, that is trouble. But usually the solution is not to guard individual fields; rather the entire object should be swapped.
Say you have the user object in the session. Most requests simply retrieve it, read it and display it.
There is a request that can modify user information. It would be a really bad idea to retrieve the user object, modify it. It's better to create complete new user object, and insert it into session.
In that case, fields in User don't need any protection; thread safety is guaranteed by session setAttribute() - getAttribute()
If you have concurrency issues, just adding 'volatile' probably won't help you.
As for keeping the object as an attribute of Session, I'd recommend you to keep just the object's ID, and use it to retrieve a 'live' instance when you need it (if you use Hibernate, successive retrieves will return the same object, so this shouldn't cause performance problems). Encapsulate all modification logic to this specific object into a single façade, and do the control concurrency there, using dababase locking.
Or, if you really, really, really want to use memory-based locking, and are really sure that you'll never have two instances of the application running in a cluster, make sure that your façade logic is synchronized at the right level. If your synchronization is too fine grained (low-level operations, such as volatile variables), it probably won't be enough to make your code thread-safe. For example, java.util.Hashtable is fully synchronized, but it doesn't mean anything if you have logic like this:
01 if (!hashtable.containsKey(key)) {
02 hashtable.put(key, calculate(key));
03 }
If two threads, say, t1 and t2, hit this block at the same time, t1 may execute line 01, then t2 may also execute 01, and then 02, and t1 then will execute 02, overwriting what t2 had done. The operations containsKey() and put() are atomic individually, but what should be atomic is the whole block.
Sometimes recalculating a value doesn't matter, but sometimes it does, and it will break.
When it comes to concurrency, there's no magic. I mean, seam some crappy frameworks try to sell you the idea that they solve this problem for you. They don't. Even if it works 99% of the time, it will break spectacularly when you go to production and start to get heavy traffic. Or (much, much) worse, it will silently generate wrong results.
Concurrency is one of the most complex problems in programming. And the only way to handle it is to avoid it. All this functional programming trend is not about dealing with concurrency, is about avoiding it altogether.
It turns out that volatile was not needed in the end. The problem that "appeared" to be fixed with volatile was actually a very subtle timing sensitive bug that was fixed in a much more elegant and proper way ;)
So sbrigdes was correct when he said "simply using a volatile keyword is probably not a good solution."

Categories

Resources