I have a class which acts as a simple crawler and I want to invoke this class within a servlet.
My idea is to get an url from user then url request will be passed to the servlet and servelt pass the url to the class and class will start the crawling. and I want my servlet to create only one instance of this class.the retrieved data from crawlwer will be added to the DB directly by the class.
I want to control the behavior of the class like running/halting/stopping from servlet
(for this matter I think I am able to create a simple xml file which will be shared between servlet and class and if servlet change the status code class should response to the status change)
But I have some doubts about how to control the behavior of the class such as command it to run/halt/stop and since my class is not multithreaded I don't have any idea what will happen to invoked class after calling it from servlet and since this class needs to read from network obviously I'll have some gap/freezing phase during running it.
How can I solve the problem of concurrency in this situation?or in other word will I have any concurrency issue or not?
regards.
It depends on the Servlet container you are using. Some containers spawn a new Thread per user request (almost always this is the desired behavior), so you should definitely design for concurrency.
You can make the Servlet class implement SingleThreadModel, then in the service method you can directly call the crawler class code, as only a thread will enter service at a time.
This implies only an URL can be processed at a given time, which is probably not what you want, so instead of that, don't implement SingleThreadModel and create a singleton executor service in the init method:
ExecutorService ex = Executors.newFixedThreadPool(20); //Only 20 tasks at a given time
Then, in the service method create a new CrawlingTask (Runnable) with the URL specified in the request, then submit the task to the executor.
That way you could also shutdown it:
ex.shutdown();
As ExecutorService is thread-safe, you don't have to worry about concurrency when enqueuing tasks.
First, understand the difference between a Class and a Thread. A class is just code, a thread is where the code is executed. You don't stop/halt a class, you stop or halt a thread that is executing code in a class.
I would suggest you start reading up on Java concurrency programming. since what you are describing is very much about multithreading and thread synchronization.
Related
Apologies for the long question..
I'm fairly new to Spring and don't understand the inner working fully yet.
So, my current java project has Spring 4.x code written way back in 2015 that uses ThreadLocal variable to store some user permission data.
The flow starts as a REST call in a REST controller which then calls the backend code and checks for user permissions from the DB.
There is a #Repository class that has a static instance of ThreadLocal where this user permission is stored. The ThreadLocal variable is updated by the calling thread.
So, if the thread finds data in the ThreadLocal instance already present for it, it just reads that data from the ThreadLocal variable and works away. If not, it goes to DB tables and fetches new permission data and also updates the ThreadLocal variable.
So my understanding is that ThreadLocal variable was used as these user permissions are needed multiple times within the same REST Call. So the idea was for a given REST request since the thread is the same, it needn't fetch user permissions from DB and instead can refer to its entry in the ThreadLocal variable within the same REST request.
Now, this seems to work fine in Spring 4.3.29.RELEASE as every REST call was being serviced by a different thread.(I printed Thread IDs to confirm.)
Spring 4.x ThreadStack up to Controller method call:
com.xxx.myRESTController.getDoc(MyRESTController.java),
org.springframework.web.context.request.async.WebAsyncManager$5.run(WebAsyncManager.java:332),
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511),
java.util.concurrent.FutureTask.run(FutureTask.java:266),
java.lang.Thread.run(Thread.java:748)]
However, when I upgraded to Spring 5.2.15.RELEASE this breaks when calling different REST endpoints that try to fetch user permissions from the backend.
On printing the Stacktrace in the backend, I see there is a ThreadPoolExecutor being used in Spring 5.x.
Spring 5.x ThreadStack:
com.xxx.myRESTController.getDoc(MyRESTController.java),
org.springframework.web.context.request.async.WebAsyncManager.lambda$startCallableProcessing$4(WebAsyncManager.java:337),
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511),
java.util.concurrent.FutureTask.run(FutureTask.java:266),
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149),
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624),
java.lang.Thread.run(Thread.java:748)]
So in Spring 5.x, it looks like the same thread is being put back in the ThreadPool and later gets called for multiple different REST calls.
When this thread looks up the ThreadLocal instance, it finds stale data stored by it for an earlier unrelated REST call. So quite a few of my test cases fail due to stale data permissions being read by it.
I read that calling ThreadLocal's remove() clears the calling thread's entry from the variable (which wasn't implemented at the time).
I wanted to do this in a generic way so that all REST calls call the remove() before the REST Response is sent back.
Now, in order to clear the ThreadLocal entry, I tried
writing an Interceptor by implementing HandlerInterceptor but this didn't work.
I also wrote another Interceptor extending HandlerInterceptorAdapter and calling ThreadLocal's remove() in its afterCompletion().
I then tried implementing ServletRequestListener and called the ThreadLocal's remove() from its requestDestroyed() method.
In addition, I implemented a Filter and called remove() in doFilter() method.
All these 4 implementations failed cos when I printed the Thread IDs in their methods they were the exact same as each other, but different to the Thread ID being printed in RestController method.
So, the Thread calling the REST endpoint is a different thread from those being called by the above 4 classes. So the remove() call in the above classes never clears anything from ThreadLocal variable.
Can someone please provide some pointers on how to clear the ThreadLocal entry for a given thread in a generic way in Spring?
As you noticed, both the HandlerInterceptor and the ServletRequestListener are executed in the original servlet container thread, where the request is received. Since you are doing asynchronous processing, you need a CallableProcessingInterceptor.
Its preProcess and postProcess methods are executed on the thread where asynchronous processing will take place.
Therefore you need something like this:
WebAsyncUtils.getAsyncManager(request)//
.registerCallableInterceptor("some_unique_key", new CallableProcessingInterceptor() {
#Override
public <T> void postProcess(NativeWebRequest request, Callable<T> task,
Object concurrentResult) throws Exception {
// remove the ThreadLocal
}
});
in a method that has access to the ServletRequest and executes in the original servlet container thread, e.g. in a HandlerInterceptor#preHandle method.
Remark: Instead of registering your own ThreadLocal, you can use Spring's RequestAttributes. Use the static method:
RequestContextHolder.currentRequestAttributes()
to retrieve the current instance. Under the hood a ThreadLocal is used, but Spring takes care of setting it and removing it on every thread where the processing of your request takes place (asynchronous processing included).
This question already has answers here:
How to run a background task in a servlet based web application?
(5 answers)
Closed 7 years ago.
Having read a lot of different things and not being totally familiar with the correct terms for some of these things, I am wondering what the correct way is for creating threads at time of tomcat startup that have a specific purpose. I am using a linux system.
I am not trying to create a thread from a servlet, and I have a good idea of why this is bad. When tomcat starts up, I want two different threads that have a distinct purpose to start. The first thread runs on a period, every 30 minutes, to audit back-end data. The second thread is responsible for sending emails. When a request runs a servlet where an email must be sent, instead of holding up the servlet waiting for the email to be sent, I want to send a request to this email thread and move on. So one runs periodically, and one runs on demand. I never need to increase the number of threads performing these tasks, I just need these two going, and for it to be going as long as the webapp is running, and if the webapp has to halt, I don't really care about graceful shutdown.
There are a few approaches I know that I could take:
Create a process running outside of tomcat and define a way to communicate with that process from my servlet. I'd like to avoid this if possible, because I'd like for these two threads to first be directly associated with start up and shut down of the webapp, and for them to be able to access data in the ServletContext, and to not have to ser/des data across a communication channel.
Spawn two threads in the init method of one of my servlets. This seems dirty and hackish, but it would definitely get the job done.
Create a ServletContextListener that extends a ThreadPoolExecutor or other ExexutorService. This seems appealing and correct, and I suppose I could do a fixed thread pool of 2 threads because I don't need any more. But is this the recommended way to do what I am trying? I understand that an ExecutorService is really more meant for executing Runnables on the fly without having to lose the overhead of creating a thread. I don't know if that is totally right, but if it is, I would not be using an executor for its meant purpose.
Maybe there are more methods that I do not know about, or a correct way to implement one of the above.
i will not go on implementing the Thread pool itself, but on your question:
Correct way to create task threads in tomcat at startup
as others said, your third approach is almost correct BUT this depends on your Service Structure.
i'll give you an example and then explain it:
public class YourServletContextListener implements ServletContextListener{
#Override
public void contextDestroyed(ServletContextEvent sce) {
}
#Override
public void contextInitialized(ServletContextEvent sce) {
}
}
From the Docs:
contextInitialized:
All ServletContextListeners are notified of context initialization
before any filter or servlet in the web application is initialized.
if i understand your approach correctly, then i would ask you: how would you like to start a service or send any request to a servlet that is still not been initialized?
this would work, if Services you would like to start do not communicate/need any servlet or Filter or any Data from the web App directly. later, after startup the container, they can sure communicate with each other.
as i said before, which way is the best to use depends on the Service Structure/Logic.
One other method may be using a filter:
void init(FilterConfig filterConfig) throws ServletException
Called by the web container to indicate to a filter that it is being
placed into service. The servlet container calls the init method
exactly once after instantiating the filter.
void destroy()
Called by the web container to indicate to a filter that it is being
taken out of service. This method is only called once all threads
within the filter's doFilter method have exited or after a timeout
period has passed. After the web container calls this method, it will
not call the doFilter method again on this instance of the filter.
This method gives the filter an opportunity to clean up any resources
that are being held (for example, memory, file handles, threads) and
make sure that any persistent state is synchronized with the filter's
current state in memory.
but a filter is not designed for such approachs!
Use a Filter if you want to intercept on HTTP requests maching a
specific URL pattern because you want to check/modify the HTTP
request/response. Use a ServletContextListener if you want to
intercept on webapp's startup and/or shutdown.
Please refer to the following Q&A: tomcat 6 thread pool for asynchronous processing.
Also, instead of Executors.newFixedThreadPool you'll probably need Executors.newScheduledThreadPool to create instance of ScheduledExecutorService which is capable of executing repeating tasks.
Hope this helps...
This question has been asked previously and discussed before but i want to ask it further.
Are Servlets Singleton or not ?
According to me they are initialized only by the container but they are still not singleton ?? why ??
Are Servlets Single Thread or multi Threaded (Forget about javax.servlet.SingleThreadModel class)
i.e. What happens when there are multiple requests for a single servlet ??
If they are executed conncurrently, that means it is multi threaded ??
and if its multi threaded then each thread will have an instance of the servlet, which contradicts with the 1st point !!
What i think is,
Whenever theres a new request, The container creates a new Thread for the incoming Request say Req1, in that it calls or dispatches the control to the service method of servlet. Now this execution happens concurrenlty.. i guess so...
Does my working stands the same in a MVC envirionment ?? (say Struts 1/2, Springs)
The fact that exists just one instance doesn't means it isn't multithread. More thread can concurrently call tha same method of the same instance.
Servlets are absolutly multithread.
Servlets are multithreaded - this is the base for their efficiency.
One can use "implements SingleThreadModel" to make a servlet single-threaded, so for every request a new object will be created.
Recently this interface SingleThreadModel was deprecated
It depends on the implementation. The servlet interface doesn't specify.
http://docs.oracle.com/javaee/5/api/javax/servlet/Servlet.html
However, if you see the HttpServlet then you can see it specifies that it should be synchronized.
http://docs.oracle.com/javaee/5/api/javax/servlet/http/HttpServlet.html
Perhaps, the diagram given in the link below illustrates everything...
http://www.tutorialspoint.com/servlets/servlets-life-cycle.htm
I have developed the following program/architecture:
A) A Java servlet receives POST requests, gets the Parameters from the POST requests and stores them in a public static LinkedList:
public static LinkedList incomingQueue = new LinkedList<myObjects>();
That is, for every POST request I do this:
incomingQueue.push(myObject);
Now, I want to periodically access the Queue and perform processing on the Objects:
while(true){
doProcessing(incomingQueue);
wait(someTime);
}
Obviously, I don't have a main class to do this. How do I create such a class that has access to the incomingQueue without being triggered by the servlet? What is the correct architecture to do this?
Thank you for your time.
First of all the queue should be placed in servlet context attributes (see: ServletContext.setAttribute(). Also access to this queue must be synchronized, consider ArrayBlockingQueue.
In plain servlets you can use ServletContextListener by starting a thread in contextInitialized() and interrupting it in contextDestroyed.
If you are using spring you can use #Scheduled annotation, in ejb: TimerService or #Schedule.
Finally there is a Timer class in standard Java. Last but not least, have a look at jms, it might be a better choice in your situation.
You have several options:
Use a scheduling library like Quartz
If you don't want to use a separate library, you should add a Listener to your web.xml that extends ServletContextListener and starts a separate thread on contextInitialized().
Also: Note the comment by #BrianRoach. The point about the synced list is rather important.
You need to synchronize your methods for concurrent access.
A very hard core solution would be to implement it like producer and consumer. Here is an example that uses stack and 1 producer and 3 consumers.
Much neater solution would be to use JMS.
My question is about threads being queued. For my example I have one Spring context. I have a method named CalculateTax in a stateless class. A request comes in, a thread is created (tA) and it eventually enters the CalculateTax method. Within the same "time frame" another request comes in and another thread is created (tB). Now, here is what I want to understand. AFAIK tB cannot execute CalculateTax until tA has exited the method. Is this true?
As long as CalculateTax only uses local variables (i.e. declared in the method), you will not have any thread sync issues and multiple threads can call the method without a problem.
However if for some reason CalculateTax uses variables defined at the class level, and you are using the Singleton pattern (you tagged your question with "singleton", so I guess you are), you may have thread sync issues.
No it is not true if they are parallel thread, each thread is in its own stack of execution so it should be able to execute while tA is executing.
This is what Threads are for.
Generally speaking the answer is undefined. If your 'request' comes from remote client the answer depends on implementation details of the mechanism used for service exposing.
However, I'm not aware about remote communication frameworks that really make the proxy serializing the requests, i.e. that is assumed to be addressed by target service developer (e.g. its your task to provide thread-safety for the service implementation OR serialize all requests using explicit synchronization etc).