SolrJ Thread Safety

SolrJ Thread Safety - java

I am using CommonsHttpSolrServer in a Web Application. Is it safe to reuse the CommonsHttpSolrServer over multiple requests or should I instantiate a new object for each request? Could not find the answer in the API docs.

According to the documentation and the source comments, SolrJ is thread safe.
However, be careful when you update solr. According to this post, the transactions are implemented per instance, not per queue. This means that each thread does not have it's own isolated transaction to work with. Rollback will rollback every call (regardless of originating thread) to the last commit.
Overall, this means that you should be safe to query (using the same CommonsHttpSolrServer) with as many threads as you like. However, if you wish to take advantage of rollback, you will need to ensure only one thread is updating your solr instance at a time (regardless of object distribution).

Related

Creating Threads with java in AppEngine Standard Environment

I'm new in Google Cloud Platform. I'm using AppEngine standard Environment. I need to create Threads in java but I think it's not possible, is it?
Here is the situation:
I need to create Feeds for users.
There are three databases with names d1, d2, d3.
Whenever a user sends a request for feeds Java creates three threads, one for each database. For example t1 for d1, t2 for d2 and t3 for d3. These threads must run asynchronously for better performance and after that the data from these 3 threads is combined and sent in the response back to user.
I know how to write code for this, but as you know I need threads for this work. If AppEngine standard Env. doesn't allow it then what can I do? Is there any other way?
In GCP Documentation they said:
To avoid using threads, consider Task Queues
I read about Task Queues. There are two types of queues: Push and Pull. Both run asynchronously but they do not send a response back to the user. I think they are only designed to complete tasks in the background.
Can you please let me know how can I achieve my goal? What things I need to learn for this?

Note: the answer is based solely on documentation, I'm not a java user.
Threads are supported by the standard environment, but with restrictions. From Threads:
Caution: Threads are a powerful feature that are full of surprises. To learn more about using threads with Java, we recommend
Goetz, Java Concurrency in Practice.
A Java application can create a new thread, but there are some
restrictions on how to do it. These threads can't "outlive" the
request that creates them.
An application can
Implement java.lang.Runnable.
Create a thread factory by calling com.google.appengine.api.ThreadManager.currentRequestThreadFactory().
Call the factory's newRequestThread method, passing in the Runnable, newRequestThread(runnable), or use the factory object
returned by
com.google.appengine.api.ThreadManager.currentRequestThreadFactory()
with an ExecutorService (e.g., call
Executors.newCachedThreadPool(factory)).
However, you must use one of the methods on ThreadManager to create
your threads. You cannot invoke new Thread() yourself or use the
default thread factory.
An application can perform operations against the current thread, such
as thread.interrupt().
Each request is limited to 50 concurrent request threads. The Java
runtime will throw a java.lang.IllegalStateException if you try to
create more than 50 threads in a single request.
When using threads, use high level concurrency objects, such as
Executor and Runnable. Those take care of many of the subtle but
important details of concurrency like Interrupts and scheduling
and bookkeeping.

An elegant way to implement what you need would be to create a parametrable endpoint in your application
/runFeed?db=d1
And from your "main" application code you can perform a fetchAsync call from URLFetchService that will return you a java.util.concurrent.Future<HTTPResponse>
This will allow you a better monitoring of what your application does.
This will add network latency to your application and increase its cost since urlFetchService is not free.

Closing an HTTP Session for Writing in Java / Tomcat

When working on an ASP.NET application, I discovered that placing something in the session cache, or really, accessing variables in the session cache, caused my Ajax queries to stop being asynchronous. I learned that this was because the session basically blocks - if I fire two Ajax requests from my browser at the same time, and the first one takes a bit to return, the session is locked in the first request until that request is completed, at which point my second Ajax request starts working.
In PHP I gather that there is an option to close the session for writing (and / or open it in a read-only way) so that session variable access is non blocking and things stay asynchronous.
I'm building an application that will be Java, probably running on Tomcat (though I could change to some other container if I needed) and I am not able to find out whether Java has the same issue (session variable reads block) or has the same remedy (early close, read only mode). Has anyone encountered that issue before?

In Tomcat, HttpSession is implemented in org.apache.catalina.session.StandardSession (source here).
If you look at the source, you will see that calls to HttpSession.getAttribute(String) and HttpSession.setAttribute(String, Object) are pretty much channelled to a ConcurrentHashMap without any additional synchronization.
This means that these calls derive the contract of ConcurrentHashMap. Quoting its Javadoc:
retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. <..> Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove)
The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary.

It looks like blocking takes place because of threads synchronization of access to HttpSession as described in this SO answer
So, it must be that 2nd request is blocked only while 1st one is working with HttpSession (or if you have some shared lock which is held for long time by 1st request, but this has nothing to do with Tomcat).
Since this synch is required by Servlets spec, you shouldn't try to violate it. Instead, make your app so it minimizes time it needs to read from or write to HttpSession.
Additionally, as I wrote above, blocking may occur if you have additional lock which makes several requests execute sequentially. Try to make several thread dumps of Tomcat when you have sent 2nd request to Tomcat and see if there's any such lock which is waited by 2nd requet for.

Java Multithreaded Caching with Single Updater Thread

I have a web service that has ~1k request threads running simultaneously on average. These threads access data from a cache (currently on ehcache.) When the entries in the cache expire, the thread that hits the expired entry tries getting the new value from the DB, while the other threads also trying to hit this entry block, i.e. I use the BlockingEhCache decorator. Instead of having the other threads waiting on the "fetching thread," I would like the other threads to use the "stale" value corresponding to the "missed" key. Is there any 3rd party developed ehcache decorators for this purpose? Do you know of any other caching solutions that have this behavior? Other suggestions?

I don't know EHCache good enough to give specific recommendations for it to solve your problem, so I'll outline what I would do, without EHCache.
Let's assume all the threads are accessing this cache using a Service interface, called FooService, and a service bean called SimpleFooService. The service will have the methods required to get the data needed (which is also cached). This way you're hiding the fact that it's cached from from the frontend (http requests objects).
Instead of simply storing the data to be cached in a property in the service, we'll make a special object for it. Let's call it FooCacheManager. It will store the cache in a property in FooCacheManger (Let's say its of type Map). It will have getters to get the cache. It will also have a special method called reload(), which will load the data from the DB (by calling a service methods to get the data, or through the DAO), and replace the content of the cache (saved in a property).
The trick here is as follows:
Declare the cache property in FooCacheManger as AtomicReference (new Object declared in Java 1.5). This guarantees thread safety when you read and also assign to it. Your read/write actions will never collide, or read half-written value to it.
The reload() will first load the data into a temporary map, and then when its finished it will assign the new map to the property saved in FooCacheManager. Since the property is AtomicReference, the assignment is atomic, thus it's basically swiping the map in an instant without any need for locking.
TTL implementation - Have FooCacheManager implement the QuartzJob interface, and making it effectively a quartz job. In the execute method of the job, have it run the reload(). In the Spring XML define this job to run every xx minutes (your TTL) which can also be defined in a property file if you use PropertyPlaceHolderConfigurer.
This method is effective since the reading threads:
Don't block for read
Don't called isExpired() on every read, which is 1k / second.
Also the writing thread doesn't block when writing the data.
If this wasn't clear, I can add example code.

Since ehcache removes stale data, a different approach can be to refresh data with a probability that increases as expiration time approaches, and is 0 if expiration time is "sufficiently" far.
So, if thread 1 needs some data element, it might refresh it, even though data is not old yet.
In the meantime, thread 2 needs same data, it might use the existing data (while refresh thread has not finished yet). It is possible thread 2 might try to do a refresh too.
If you are working with references (the updater thread loads the object and then simply changes the reference in the cache), then no separate synchronization is required for get and set operations on the cache.

Using a JMS Session from different threads

From the javadoc for Session it states:
A Session object is a single-threaded context for producing and consuming messages.
So I understand that you shouldn't use a Session object from two different threads at the same time. What I'm unclear on is if you could use the Session object (or children such as a Queue) from a different thread than the one it created.
In the case I'm working on, I'm considering putting my Session objects into a pool of available sessions that any thread could borrow from, use, and return to the pool when it is finished with it.
Is this kosher?
(Using ActiveMQ BTW, if that impacts the answer at all.)

I think the footnote from section 4.4 in the JMS 1.1 spec sheds some light:
There are no restrictions on the number of threads that can use a Session object or those it creates. The restriction is that the resources of a Session should not be used concurrently by multiple threads. It is up to the user to insure that this concurrency restriction is met. The simplest way to do this is to use one thread. In the case of asynchronous delivery, use one thread for setup in stopped mode and then start asynchronous delivery. In more complex cases the user must provide explicit synchronization.
By my reading of the spec what you want to do is OK, provided you correctly manage concurrency.

Sadly the JMS docs are often not written as clearly or precisely as we might like :o(
But reading the spec I'm now pretty convinced you really shouldn't access the session from other threads, even if you guarantee there's no concurrent access. The bit of the javadoc that swung it for me was:
Once a connection has been started,
any session with a registered message
listener(s) is dedicated to the thread
of control that delivers messages to
it. It is erroneous for client code to
use this session or any of its
constituent objects from another
thread of control. The only exception
to this is the use of the session or
connection close method.
Note the clear use of 'thread of control' and the singling out of 'close()' as the only exception.
They seem to be saying that even if you're using asynchronous message consumption (i.e. setMessageListener) - which means you get called back on another thread created by JMS to receive messages - you're never allowed to touch the session or related objects again from any other thread, because the session is now 'dedicated' to the JMS delivery thread. For example, I assume this means you couldn't even call message.acknowledge() from another thread.
Having said that, I only just noticed that we haven't been obeying this constraint, and have yet to notice any ill effects (using SonicMQ). But of course if you don't obey the standard, all bets are off, so I guess we need to obey the 1-thread 1-session rule to stay safe.

Testing simultaneous calls to transactional service

How should I test a service method that is transactional for its simultaneous use (it updates a database row by decreasing a value)?
I have setup a JUnit test class with SpringJunit4ClassRunner and components are #autowired.
Just spawning threads which would call the method doesn't seem to work. I'm not sure whether this has something to do with the Spring proxy mechanism.
What I would like to achieve is to create a situation where simultaneously two threads are "inside" the tested method and other one will fail and rollback. e.g. The row value is 3 and both method calls try to decrease the value by 2; if the method wouldn't work, the value would be -1, which is illegal. But I want that either both of the calls fail and rollback, or failing the one that tries to update it an instant later than the other.
Is this even possible?

The first problem is that the transaction context is bound to one thread (with a thread local). So you have to start a transaction in each of your threads. (I think there is no support for this in spring. You can start transaction programmatically with the transaction manager.)
The code you described: read, decrement, write does only work with the right isolation level (serialized and repeatable read would work).
After this setup is done, you can test the behavior by blocking one thread while he has the database lock. You can use a Latch for this.
The thread without database lock will now still not rollback. It will block until the database lock is available again. The scheme you're describing is quite similiar to Optimistic concurrency control so maybe this is already implemented.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.