Updating integer atomically over multiple JVMs for every key

Updating integer atomically over multiple JVMs for every key - java

We have a requirement, where the problem can be narrowed down as.
There are multiple keys and each key maps to a integer.
When a key is received on a JVM, you need to retrieve the int value from the shared memory, increment it and then put the incremented value back on the shared memory.
So when two JVMs or two threads read the same value, then the update of one of them should fail consistently, so that you do not lose any increment done by any of the thread on any of the JVM.
Once an update fails, you read again from the shared memory, increment it and then update again till the update is successful or you have exhausted some 'N' number of retries.
Right now we are using infinispan with optimistic locking, but the behavior is not consistent. Please find the link to that thread.
https://developer.jboss.org/message/914490
Is there any other technology which will fit in well for this requirement.

Synchronizing between threads is easy, but between JVMs is extremely hard, especially if you need to support multiple platforms. I would suggest centralising the update code using one of the following methods, both of which "contract out" the data update task:
Publish a trivial REST API from a single process that knows how to do the update task, and serialize the requests.
Use a relational database to hold the counts, and make sure the client code correctly rolls back transactions when they don't succeed.
Probably not what you wanted to hear, but either method will work well.

Related

Can I have local state in a Kafka Processor?

I've been reading a bit about the Kafka concurrency model, but I still struggle to understand whether I can have local state in a Kafka Processor, or whether that will fail in bad ways?
My use case is: I have a topic of updates, I want to insert these updates into a database, but I want to batch them up first. I batch them inside a Java ArrayList inside the Processor, and send them and commit them in the punctuate call.
Will this fail in bad ways? Am I guaranteed that the ArrayList will not be accessed concurrently?
I realize that there will be multiple Processors and multiple ArrayLists, depending on the number of threads and partitions, but I don't really care about that.
I also realize I will loose the ArrayList if the application crashes, but I don't care if some events are inserted twice into the database.
This works fine in my simple tests, but is it correct? If not, why?

Whatever you use for local state in your Kafka consumer application is up to you. So, you can guarantee only the current thread/consumer will be able to access the local state data in your array list. If you have multiple threads, one per Kafka consumer, each thread can have their own private ArrayList or hashmap to store state into. You could also have something like a local RocksDB database for persistent local state.
A few things to look out for:
If you're batching updates together to send to the DB, are those updates in any way related, say, because they're part of a transaction? If not, you might run into problems. An easy way to ensure this is the case is to set a key for your messages with a transaction ID, or some other unique identifier for the transaction, and that way all the updates with that transaction ID will end up in one specific partition, so whoever consumes them is sure to always have the
How are you validating that you got ALL the transactions before your batch update? Again, this is important if you're dealing with database updates inside transactions. You could simply wait for a pre-determined amount of time to ensure you have all the updates (say, maybe 30 seconds is enough in your case). Or maybe you send an "EndOfTransaction" message that details how many messages you should have gotten, as well as maybe a CRC or hash of the messages themselves. That way, when you get it, you can either use it to validate you have all the messages already, or you can keep waiting for the ones that you haven't gotten yet.
Make sure you're not committing to Kafka the messages you're keeping in memory until after you've batched and sent them to the database, and you have confirmed that the updates went through successfully. This way, if your application dies, the next time it comes back up, it will get again the messages you haven't committed in Kafka yet.

Design AppServer Interview Discussion

I encountered the following question in a recent System Design Interview:
Design an AppServer that interfaces with a Cache and a DB.
I came up with this:
public class AppServer{
public Database DB;
public Cache cache;
public Value get(Key k){
Value res = cache.get(k);
if(res == null){
res = DB.get(k);
cache.set(k, res);
}
}
public void set(Key k, Value v){
cache.set(k, v);
DB.set(k, v);
}
}
This code is fine and works correctly, but follow ups to the question are:
What if there are multiple threads?
What if there are multiple instances of the AppServer?
Suddenly AppServer performance degrades a ton, we find out this is because our cache is consistently missing. Cache size is fixed (already largest that it can be). How can we prevent this?
Response:
I answered that we can use Locks or Conditional Variables. In Java, we can add Synchronized to each method to allow for mutual exclusion, but the interviewer mentioned that this isn't too efficient and wanted only critical parts synchronized.
I thought that we only need to synchronize the 2 set lines in void set(Key k, Value v) and 1 set method in Value get(Key k), however the interviewer pushed for also synchronizing res = DB.get(k);. I agreed with him at the end, but don't fully understand. Don't threads have independent stacks and shared heaps? So when a thread executes get, it stores res in local variable on stack frame, even if another thread executes get sequentially, the former thread retains its get value. Then each thread sets their respective fetched values.
How can we handle multiple instances of the AppServer?
I came up with a Distributed Queue Solution like Kafka, every time we perform a set / get command we queue that command, but he also mentioned that set is ok because the action sets a value in the cache / db, but how would you return the correct value for get? Can someone explain this?
Also there are possible solutions with a versioning system and event system?
Possible solutions:
L1, L2, L3 caches - layers and more caches
Regional / Segmentation caches - use different cache for user groups.
Any other ideas?
Will upvote all insightful responses :)

1
Although JDBC is "supposed" to be thread safe, some drivers aren't and I'm going to assume that Cache isn't thread safe either (although most caches should be thread safe) so in that case, you would need to make the following changes to your code:
Make both fields final
Synchronize the ENTIRE get(...)method
Synchronize the ENTIRE set(...)method
Assuming there is no other way to access the said fields, the behavior of your get(...) method depends on 2 things: first, that updates from the set(...) method can be seen, and secondly, that a cache miss is then stored only by a single thread. You need to synchronize because the idea is to only have one thread perform an expensive DB query in the case that there is a cache miss. If you do not synchronize the entire get(...) method, or you split the synchronized statement, it is possible for another thread to also see a cache miss between the lookup and insertion.
The way I would answer this question is honestly just to toss the entire thing. I would look at how JCIP wrote the cache and base my answer on that.
2
I think your queue solution is fine.
I believe your interviewer means that if another instance of AppServer did not have cached what was already set(...) by another instance of AppServer, then it would lookup and find the correct value in the DB. This solution would be incorrect if you are using multiple threads because it is possible for 2 threads to be set(...)ing conflicting values, then the caches would have 2 different values while depending on the thread safety of your DB, it might not even have the value at all.
Ideally, you'd never create more than a single instance of your AppServer.
3
I don't have enough experience to evaluate this question specifically, but perhaps an LRU cache would improve performance somewhat, or using a hash ring buffer. It might be a stretch but if you wanted to throw out there, perhaps even using ML to determine the best values to either preload to retain at certain times of the day, for example, could also work.
If you are always missing values from your cache, there is no way to improve your code. Performance would be dependent on your database.

Java in memory data storage thread safety

I'm making a real time multiplayer game server in Java. I'm storing all data for matches in memory in a HashMap with "match" objects. Each match object contains information about the game and game state for all players (anywhere from 2-5 in one match). The server will pass the same match object for each user's connection to the server.
What I'm a little concerned about is making this thread safe. Connections could be made to different threads in the server, all of which need to access the same match.
The problem with that is there would be a lot of variables/lists in the object, all of which would need to be synchronized. Some of them may need to be used to perform calculations that affect each other, meaning I would need nested synchronized blocks, which I don't want.
Is synchronized blocks for every variable in the match object my only solution, or can I do something else?
I know SQLite has an in memory mode, but the problem I found was this:
Quote from their website:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution
A few dozen milliseconds? That's a long time. Would that be fast enough, or is there another in memory database that would be suited for real time games?

Your architecture is off in this case. You want a set of data to be modified and updated by several threads at once, which might be possible, but is extremely difficult to get right and fast at the same time.
It would be much easier if you change the architecture like follows:
There is one thread that has exclusive access to a single match object. A thread could handle multiple match objects, but a single match object will only be handled/guarded by a single thread. Now if any external effect wants to change any values, it needs to make a "change request", but cannot change it immediately on it's own. And once the change has been implemented and the values updated, the thread guarding the match object will send out an update to the clients.
So lets say a player scores a goal, then the client thread calls a function
void clientScoredGoal(Client client) {
actionQueue.put(new GoalScoredEvent(client));
}
Where actionQueue is i.E. a BlockingQueue.
The thread handling the match objects is listening on this queue via actionQueue.take() and reacts as soon as a new action has been found. It will then apply the change, updated internal values if neccessary, and then distributes an update package (a "change request" to clients if you want).
Also in general synchronized should be considered bad practice in Java. There are certain situations where it is a good way to handle synchronization, but in like 99% of all cases using features from the Concurrent package will be by far the better solution. Notice the complete lack of synchronized in the example code above, yet it is perfectly thread-safe.

the question is very generic. It is difficult to give specific advice.
I'm making a real time multiplayer game server in Java. I'm storing all data for matches in memory in a HashMap with "match" objects.
If you want to store "match" objects in a Map and then have multiple threads requesting/adding/removing objects from the map, then you have to use a "ConcurrentHashMap".
What I'm a little concerned about is making this thread safe. Connections could be made to different threads in the server, all of which need to access the same match.
The safest and easiest way to have multithreading is to make each "match" an immutable object, then there is no need to synchronize.
If "match" information is mutable and accessed simultaneously by many threads, then you will have to synchronize. But in this case, the "mutable state" is contained within a "match", so only the class "match" will need to use synchronization.
I would need nested synchronized blocks, which I don't want.
I haven't ever seen the need to have nested synchronized blocks. perhaps you should refactor your solution before you try to make it thread safe.
Is synchronized blocks for every variable in the match object my only solution, or can I do something else? I know SQLite has an in memory mode
If you have objects with mutable state that are accessed by multiple threads, then you need to make them thread safe. there is no other way (notice that I didn't say that "synchronized blocks" is the only option. there are different ways to achieve thread safety). Using an in memory database is not the solution to your thread safety problem.
The advantage of using an in memory database is in speeding up the access to information (as you don't have to access a regular database with information stored in an HDD), but with the penalty that now your application needs more RAM.
By the way, even faster than using an in memory database would be to keep all the information that you need within objects in your program (which has the same limitation of requiring more RAM).

How synchronized block is handled in a clustered environment

By clustered environment I mean same code running on multiple server machines.My scenario what I can think of is as follows
Multiple request come to update Card details based on expiry time from different threads at the same time. A snippet of code is following
synchronized(card) { //card object
if(card.isExpired())
updateCard()
}
My understanding is synchronized block works at jvm level so how in multiserver environment it is achieved.
Please suggest edit to rephrase question. I asked what I can recollect from a question asked to me.

As you said, synchronized block is only for "local JVM" threads.
When it comes to cluster, it is up to you how you drive your distributed transaction.
It really depends where your objects (e.g. card) are stored.
Database - You will probably need to use some locking strategy. Very likely optimistic locking that stores a version of entity and checks it when every change is made. Or more "safe" pessimistic locking where you lock the whole row when making changes.
Memory - You will probably need some memory grid solution (e.g. Hazelcast...) and make use of its transaction support or implement it by yourself
Any other? You will have specify...

See, in a clustered environment, you will usually have multiple JVMs running the same code. If traffic is high, then actually the number of JVMs could auto-scale and increase (new instances could be spawned). This is one of the reasons why you should be really careful when using static fields to keep data in a distributed environment.
Next, coming to your actual question, if you have a single jvm serving requests, then all other threads will have to wait to get that lock. If you have multiple JVMs running, then lock acquired by one thread on oneJVM will not prevent acquisition of the (in reality, not same, but conceptually same) lock by another thread in a different jvm.

I am assuming you want to handle that only one thread can edit the object or perform the action (based on the method name i.e updatecard) I suggest you implement optimistic locking (versioning), hibernate can do this quite easily, to prevent dirty read.

How to iterate over db records correctly with hibernate

I want to iterate over records in the database and update them. However since that updating is both taking some time and prone to errors, I need to a) don't keep the db waiting (as e.g. with a ScrollableResults) and b) commit after each update.
Second thing is that this is done in multiple threads, so I need to ensure that if thread A is taking care of a record, thread B is getting another one.
How can I implement this sensibly with hibernate?
To give a better idea, the following code would be executed by several threads, where all threads share a single instance of the RecordIterator:
Iterator<Record> iter = db.getRecordIterator();
while(iter.hasNext()){
Record rec = iter.next();
// do something lengthy here
db.save(rec);
}
So my question is how to implement the RecordIterator. If on every next() I perform a query, how to ensure that I don't return the same record twice? If I don't, which query to use to return detached objects? Is there a flaw in the general approach (e.g. use one RecordIterator per thread and let the db somehow handle synchronization)? Additional info: there are way to many records to locally keep them (e.g. in a set of treated records).
Update: Because the overall process takes some time, it can happen that the status of Records changes. Due to that the ordering of the result of a query can change. I guess to solve this problem I have to mark records in the database once I return them for processing...

Hmmm, what about pushing your objects from a reader thread in some bounded blocking queue, and let your updater threads read from that queue.
In your reader, do some paging with setFirstResult/setMaxResults. E.g. if you have 1000 elements maximum in your queue, fill them up 500 at a time. When the queue is full, the next push will automatically wait until the updaters take the next elements.

My suggestion would be, since you're sharing an instance of the master iterator, is to run all of your threads using a shared Hibernate transaction, with one load at the beginning and a big save at the end. You load all of your data into a single 'Set' which you can iterate over using your threads (be careful of locking, so you might want to split off a section for each thread, or somehow manage the shared resource so that you don't overlap).
The beauty of the Hibernate solution is that the records aren't immediately saved to the database, since you're using a transaction, and are stored in hibernate's cache. Then at the end they'd all be written back to the database at once. This would save on those expensive database writes you're worried about, plus it gives you an actual object to work with on each iteration, instead of just a database row.
I see in your update that the status of the records may change during processing, and this could always cause a problem. If this is a constantly running process or long running, then my advice using a hibernate solution would be to work in smaller sets, and yes, add a flag to mark records that have been updated, so that when you move to the next set you can pick up ones that haven't been touched.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.