By clustered environment I mean same code running on multiple server machines.My scenario what I can think of is as follows
Multiple request come to update Card details based on expiry time from different threads at the same time. A snippet of code is following
synchronized(card) { //card object
if(card.isExpired())
updateCard()
}
My understanding is synchronized block works at jvm level so how in multiserver environment it is achieved.
Please suggest edit to rephrase question. I asked what I can recollect from a question asked to me.
As you said, synchronized block is only for "local JVM" threads.
When it comes to cluster, it is up to you how you drive your distributed transaction.
It really depends where your objects (e.g. card) are stored.
Database - You will probably need to use some locking strategy. Very likely optimistic locking that stores a version of entity and checks it when every change is made. Or more "safe" pessimistic locking where you lock the whole row when making changes.
Memory - You will probably need some memory grid solution (e.g. Hazelcast...) and make use of its transaction support or implement it by yourself
Any other? You will have specify...
See, in a clustered environment, you will usually have multiple JVMs running the same code. If traffic is high, then actually the number of JVMs could auto-scale and increase (new instances could be spawned). This is one of the reasons why you should be really careful when using static fields to keep data in a distributed environment.
Next, coming to your actual question, if you have a single jvm serving requests, then all other threads will have to wait to get that lock. If you have multiple JVMs running, then lock acquired by one thread on oneJVM will not prevent acquisition of the (in reality, not same, but conceptually same) lock by another thread in a different jvm.
I am assuming you want to handle that only one thread can edit the object or perform the action (based on the method name i.e updatecard) I suggest you implement optimistic locking (versioning), hibernate can do this quite easily, to prevent dirty read.
Related
I encountered the following question in a recent System Design Interview:
Design an AppServer that interfaces with a Cache and a DB.
I came up with this:
public class AppServer{
public Database DB;
public Cache cache;
public Value get(Key k){
Value res = cache.get(k);
if(res == null){
res = DB.get(k);
cache.set(k, res);
}
}
public void set(Key k, Value v){
cache.set(k, v);
DB.set(k, v);
}
}
This code is fine and works correctly, but follow ups to the question are:
What if there are multiple threads?
What if there are multiple instances of the AppServer?
Suddenly AppServer performance degrades a ton, we find out this is because our cache is consistently missing. Cache size is fixed (already largest that it can be). How can we prevent this?
Response:
I answered that we can use Locks or Conditional Variables. In Java, we can add Synchronized to each method to allow for mutual exclusion, but the interviewer mentioned that this isn't too efficient and wanted only critical parts synchronized.
I thought that we only need to synchronize the 2 set lines in void set(Key k, Value v) and 1 set method in Value get(Key k), however the interviewer pushed for also synchronizing res = DB.get(k);. I agreed with him at the end, but don't fully understand. Don't threads have independent stacks and shared heaps? So when a thread executes get, it stores res in local variable on stack frame, even if another thread executes get sequentially, the former thread retains its get value. Then each thread sets their respective fetched values.
How can we handle multiple instances of the AppServer?
I came up with a Distributed Queue Solution like Kafka, every time we perform a set / get command we queue that command, but he also mentioned that set is ok because the action sets a value in the cache / db, but how would you return the correct value for get? Can someone explain this?
Also there are possible solutions with a versioning system and event system?
Possible solutions:
L1, L2, L3 caches - layers and more caches
Regional / Segmentation caches - use different cache for user groups.
Any other ideas?
Will upvote all insightful responses :)
1
Although JDBC is "supposed" to be thread safe, some drivers aren't and I'm going to assume that Cache isn't thread safe either (although most caches should be thread safe) so in that case, you would need to make the following changes to your code:
Make both fields final
Synchronize the ENTIRE get(...)method
Synchronize the ENTIRE set(...)method
Assuming there is no other way to access the said fields, the behavior of your get(...) method depends on 2 things: first, that updates from the set(...) method can be seen, and secondly, that a cache miss is then stored only by a single thread. You need to synchronize because the idea is to only have one thread perform an expensive DB query in the case that there is a cache miss. If you do not synchronize the entire get(...) method, or you split the synchronized statement, it is possible for another thread to also see a cache miss between the lookup and insertion.
The way I would answer this question is honestly just to toss the entire thing. I would look at how JCIP wrote the cache and base my answer on that.
2
I think your queue solution is fine.
I believe your interviewer means that if another instance of AppServer did not have cached what was already set(...) by another instance of AppServer, then it would lookup and find the correct value in the DB. This solution would be incorrect if you are using multiple threads because it is possible for 2 threads to be set(...)ing conflicting values, then the caches would have 2 different values while depending on the thread safety of your DB, it might not even have the value at all.
Ideally, you'd never create more than a single instance of your AppServer.
3
I don't have enough experience to evaluate this question specifically, but perhaps an LRU cache would improve performance somewhat, or using a hash ring buffer. It might be a stretch but if you wanted to throw out there, perhaps even using ML to determine the best values to either preload to retain at certain times of the day, for example, could also work.
If you are always missing values from your cache, there is no way to improve your code. Performance would be dependent on your database.
I'm making a real time multiplayer game server in Java. I'm storing all data for matches in memory in a HashMap with "match" objects. Each match object contains information about the game and game state for all players (anywhere from 2-5 in one match). The server will pass the same match object for each user's connection to the server.
What I'm a little concerned about is making this thread safe. Connections could be made to different threads in the server, all of which need to access the same match.
The problem with that is there would be a lot of variables/lists in the object, all of which would need to be synchronized. Some of them may need to be used to perform calculations that affect each other, meaning I would need nested synchronized blocks, which I don't want.
Is synchronized blocks for every variable in the match object my only solution, or can I do something else?
I know SQLite has an in memory mode, but the problem I found was this:
Quote from their website:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution
A few dozen milliseconds? That's a long time. Would that be fast enough, or is there another in memory database that would be suited for real time games?
Your architecture is off in this case. You want a set of data to be modified and updated by several threads at once, which might be possible, but is extremely difficult to get right and fast at the same time.
It would be much easier if you change the architecture like follows:
There is one thread that has exclusive access to a single match object. A thread could handle multiple match objects, but a single match object will only be handled/guarded by a single thread. Now if any external effect wants to change any values, it needs to make a "change request", but cannot change it immediately on it's own. And once the change has been implemented and the values updated, the thread guarding the match object will send out an update to the clients.
So lets say a player scores a goal, then the client thread calls a function
void clientScoredGoal(Client client) {
actionQueue.put(new GoalScoredEvent(client));
}
Where actionQueue is i.E. a BlockingQueue.
The thread handling the match objects is listening on this queue via actionQueue.take() and reacts as soon as a new action has been found. It will then apply the change, updated internal values if neccessary, and then distributes an update package (a "change request" to clients if you want).
Also in general synchronized should be considered bad practice in Java. There are certain situations where it is a good way to handle synchronization, but in like 99% of all cases using features from the Concurrent package will be by far the better solution. Notice the complete lack of synchronized in the example code above, yet it is perfectly thread-safe.
the question is very generic. It is difficult to give specific advice.
I'm making a real time multiplayer game server in Java. I'm storing all data for matches in memory in a HashMap with "match" objects.
If you want to store "match" objects in a Map and then have multiple threads requesting/adding/removing objects from the map, then you have to use a "ConcurrentHashMap".
What I'm a little concerned about is making this thread safe. Connections could be made to different threads in the server, all of which need to access the same match.
The safest and easiest way to have multithreading is to make each "match" an immutable object, then there is no need to synchronize.
If "match" information is mutable and accessed simultaneously by many threads, then you will have to synchronize. But in this case, the "mutable state" is contained within a "match", so only the class "match" will need to use synchronization.
I would need nested synchronized blocks, which I don't want.
I haven't ever seen the need to have nested synchronized blocks. perhaps you should refactor your solution before you try to make it thread safe.
Is synchronized blocks for every variable in the match object my only solution, or can I do something else? I know SQLite has an in memory mode
If you have objects with mutable state that are accessed by multiple threads, then you need to make them thread safe. there is no other way (notice that I didn't say that "synchronized blocks" is the only option. there are different ways to achieve thread safety). Using an in memory database is not the solution to your thread safety problem.
The advantage of using an in memory database is in speeding up the access to information (as you don't have to access a regular database with information stored in an HDD), but with the penalty that now your application needs more RAM.
By the way, even faster than using an in memory database would be to keep all the information that you need within objects in your program (which has the same limitation of requiring more RAM).
We have a requirement, where the problem can be narrowed down as.
There are multiple keys and each key maps to a integer.
When a key is received on a JVM, you need to retrieve the int value from the shared memory, increment it and then put the incremented value back on the shared memory.
So when two JVMs or two threads read the same value, then the update of one of them should fail consistently, so that you do not lose any increment done by any of the thread on any of the JVM.
Once an update fails, you read again from the shared memory, increment it and then update again till the update is successful or you have exhausted some 'N' number of retries.
Right now we are using infinispan with optimistic locking, but the behavior is not consistent. Please find the link to that thread.
https://developer.jboss.org/message/914490
Is there any other technology which will fit in well for this requirement.
Synchronizing between threads is easy, but between JVMs is extremely hard, especially if you need to support multiple platforms. I would suggest centralising the update code using one of the following methods, both of which "contract out" the data update task:
Publish a trivial REST API from a single process that knows how to do the update task, and serialize the requests.
Use a relational database to hold the counts, and make sure the client code correctly rolls back transactions when they don't succeed.
Probably not what you wanted to hear, but either method will work well.
I read the following statement:
ArrayLists are unsynchronized and therefore faster than Vector, but less secure in a multithreaded environment.
I would like to know why unsynchronization can improve the speed, and why it will be less secure?
I will try to address both of your questions:
Improve speed
If the ArrayList were synchronized and multiple threads were trying to read data out of the list at the same time, the threads would have to wait to get an exclusive lock on the list. By leaving the list unsynchronized, the threads don't have to wait and the program will run faster.
Unsafe
If multiple threads are reading and writing to a list at the same time, the threads can have unstable view of the list, and this can cause instability in multi-threaded programs.
The whole point of synchronization is that it means only one thread has access to an object at any given time. Take a box of chocolates as an example. If the box is synchronized (Vector), and you get there first, no one else can take any and you get your pick. If the box is NOT synchronized (ArrayList), anyone walking by can snag a chocolate - It will disappear faster, but you may not get the ones you want.
ArrayLists are unsynchronized and
therefore faster than Vector, but less
secure in a multithreaded environment.
I would like to know why
unsynchronization can improve the
speed,and why it will be less secure?
When multiple threads are reading/writing to a shared memory location, the program might compute incorrect results due to lack of mutual exclusion and proper visibility. Hence lack of synchronization is considered "unsafe". This blog post by Jeremy Manson might provide a good introduction to the topic.
When the JVM executes a synchronized method, it makes sure that the current thread has an exclusive lock on the object on which the method is invoked. Similarly when the method finishes execution, the JVM releases the lock held by the executing thread. Synchronized methods provide mutual exclusion and visibility guarantees - and is important for "safety" (i.e. guaranteeing correctness) of the executing code. But, if only one thread is ever accessing the methods of the object, there is no safety issues to worry about. Although the JVM performance has improved over the years, uncontended synchronization (i.e. locking/unlocking of objects accessed by only one thread) still takes non-zero amount of time. For unsynchronized methods, the JVM does not pay this extra penalty - hence they are faster than their synchronized counterparts.
Vectors force their choice on you. All methods are synchronized and it is difficult to use them incorrectly. But when Vectors are used in a single-threaded context, you pay the price for the extra synchronization unnecessarily. ArrayLists leave the choice to you. When used in the multi-threaded context, it is up to you (the programmer) to correctly synchronizing the code; but when used in a single-threaded context you are guaranteed not to pay any extra synchronization overhead.
Also, when an collection is populated initially, and read subsequently ArrayLists perform better even in a multi-threaded context. For example, consider this method:
public synchronized List<String> getList() {
List<String> list = new Vector<String>();
list.add("Foo");
list.add("Bar");
return Collections.unmodifiableList(list);
}
A list is created, populated, and an immutable view of it is safely published. Looking at the code above it is clear that all subsequent uses of this list are reads and won't need any synchronization even when used by multiple threads - the object is effectively immutable. Using a Vector here incurs the synchronization overhead even for reads where it is not needed; using an ArrayList instead would perform better.
Data structures that synchronize use locks (or other synchronization constructs) to ensure that their data is always in a consistent state. Oftentimes, this requires that one or more threads wait on another thread to finish updating the structure's state, which will then reduce performance, since a wait has been introduced where before there was none.
2 threads can modify the list at the same time and add a new item or delete/modify the same item in the list at the same time because no synchronization (or lock mechanism if you prefer) exists. So imagine you delete one item of the list while somebody else is trying to work with it or you modify an item while someone uses it, it's not very secure.
http://download.oracle.com/javase/1.4.2/docs/api/java/util/ArrayList.html
Read the "Note that this implementation is not synchronized." paragraph, it explains a bit better.
And I forgot, considering speed, it seems quite trivial to imagine that when you try to control the access to a data, you add some mechanisms that prevent other people from accessing your data. Thus, you add some more computations so it is slower...
Non-blocking data structures will be faster than ones that bock, because of that fact. With blocking data structures, if a resources is acquired by some entity it will take time for another entity to acquire that same resource, once it becomes available.
However, this can be less secure in some instances depending on the situation. The main points of contention are during writes. If it can be guaranteed that the data contained in a data structure will not change it has been added and will only be accessed to read the value than there will not be a problem. The issues arise when there is a conflict between a write and a read, or a write and a write.
Here's my thinking:
Even though a HTTP request cycle is essentially handled by a 'single thread', each time a HTTP request is processed for that same session it is likely to be processed by a different thread from the thread pool.
Without the volatile keyword being used on a domain model object, whose lifecycle extends across multiple HTTP requests for the same session, then, according to my understanding, isn't it possible that the attribute could be thread local cached (an optimization by the compiler) in the thread that serviced the first HTTP request? If the second HTTP request is serviced by another thread then that second thread may not see the changes in that attribute that were made by the first thread.
Does this spell "Danger Will Robinson"? Or am I missing a vital plot point about the use (or not) of the volatile keyword?
I think you are forgetting that the threads handling the HTTP request first need to retrieve the instance of the domain model object from the HttpSession provided by your application server. The thread handling request 2 in the scenario you describe does not already have an instance of this domain model - it has to retrieve it from the session implementation at the start of handling each and every request.
I think it is completely reasonable to assume that the session-handling implementation in your application server is handling session data in such a way that memory model visibility issues are avoided. Apache Tomcat's default (non-clustered) HttpSession implementation, for example, stores the session attributes in a ConcurrentHashMap.
Adding volatile seems completely unnecessary to me. I have never seen this done for domain model objects handled by HTTP requests in a Servlet environment in any project I have worked in.
This would be a different story if thread-1 and thread-2 had references to the same object instance simulatenously while processing two different requests, and you were concerned about changes in one thread being visible to the other as each are processing the request, but this does not sound like what you are asking about.
Yes, if you are sharing an object between different threads, you may have race conditions. Without a happens before relationship, writes made by one thread may not be seen by a read in another thread.
Doing a volatile write in one thread and doing a volatile read of the same field in another thread establishes a happens before relationship between the two threads, and ensures visibility of the write.
This is a complicated problem, simply using a volatile keyword is probably not a good solution.
I think your understanding of it is correct. Given your description I would say it should be used. If its something more than a primitive type I would rather synchronize.
Good information on volatile:
http://www.javamex.com/tutorials/synchronization_volatile_when.shtml
If you have a mutable object in session, that is trouble. But usually the solution is not to guard individual fields; rather the entire object should be swapped.
Say you have the user object in the session. Most requests simply retrieve it, read it and display it.
There is a request that can modify user information. It would be a really bad idea to retrieve the user object, modify it. It's better to create complete new user object, and insert it into session.
In that case, fields in User don't need any protection; thread safety is guaranteed by session setAttribute() - getAttribute()
If you have concurrency issues, just adding 'volatile' probably won't help you.
As for keeping the object as an attribute of Session, I'd recommend you to keep just the object's ID, and use it to retrieve a 'live' instance when you need it (if you use Hibernate, successive retrieves will return the same object, so this shouldn't cause performance problems). Encapsulate all modification logic to this specific object into a single façade, and do the control concurrency there, using dababase locking.
Or, if you really, really, really want to use memory-based locking, and are really sure that you'll never have two instances of the application running in a cluster, make sure that your façade logic is synchronized at the right level. If your synchronization is too fine grained (low-level operations, such as volatile variables), it probably won't be enough to make your code thread-safe. For example, java.util.Hashtable is fully synchronized, but it doesn't mean anything if you have logic like this:
01 if (!hashtable.containsKey(key)) {
02 hashtable.put(key, calculate(key));
03 }
If two threads, say, t1 and t2, hit this block at the same time, t1 may execute line 01, then t2 may also execute 01, and then 02, and t1 then will execute 02, overwriting what t2 had done. The operations containsKey() and put() are atomic individually, but what should be atomic is the whole block.
Sometimes recalculating a value doesn't matter, but sometimes it does, and it will break.
When it comes to concurrency, there's no magic. I mean, seam some crappy frameworks try to sell you the idea that they solve this problem for you. They don't. Even if it works 99% of the time, it will break spectacularly when you go to production and start to get heavy traffic. Or (much, much) worse, it will silently generate wrong results.
Concurrency is one of the most complex problems in programming. And the only way to handle it is to avoid it. All this functional programming trend is not about dealing with concurrency, is about avoiding it altogether.
It turns out that volatile was not needed in the end. The problem that "appeared" to be fixed with volatile was actually a very subtle timing sensitive bug that was fixed in a much more elegant and proper way ;)
So sbrigdes was correct when he said "simply using a volatile keyword is probably not a good solution."