I have a questions regarding synchronization of HashMap. The background is that I am trying to implement a simple way of Brute-Force-Detection. I will use a map which has username as key and is used to save the amount of failed login attempts of the user. If a login fails, I want to do something like this:
Integer failedAmount = myMap.get("username");
if (failedAmount == null) {
myMap.put("username", 1);
} else {
failedAmount++;
if (failedAmount >= THRESHOLD) {
// possible brute force detected! alert admin / slow down login
// / or whatever
}
myMap.put("username", failedAmount);
}
The mechanism I have in mind at the moment is pretty simple: I would just track this for the whole day and clear() the HashMap at midnight or something like that.
so my question is:
what is the best/fastest Map implementation I can use for this? Do I need a fully schronized Map (Collections.sychronizedMap()) or is a ConcurrentHashMap sufficient? Or maybe even just a normal HashMap? I guess it's not that much of a problem if a few increments slipped through?
I would use a combination of ConcurrentHashMap and AtomicInteger http://docs.oracle.com/javase/6/docs/api/java/util/concurrent/atomic/AtomicInteger.html.
Using AtomicInteger will not help you with the comparison, but it will help you with keeping numbers accurate - no need to doing the ++ and the put in two steps.
On the ConcurrentHashMap, I would use the putIfAbsent method, which will eliminate your first if condition.
AtomicInteger failedAmount = new AtomicInteger(0);
failedAmount = myMap.putIfAbsent("username", failedAmount);
if (failedAmount.incrementAndGet() >= THRESHOLD) {
// possible brute force detected! alert admin / slow down login
// / or whatever
}
Unless you synchronize the whole block of code that does the update, it won't work as you expect anyway.
A synchronized map just makes sure that nothing nasty happens if you call, say, put several times simultaneously. It doesn't make sure that
myMap.put("username", myMap.get("username") + 1);
is executed atomically.
You should really synchronize the whole block that performs the update. Either by using some Semaphore or by using the synchronized keyword. For example:
final Object lock = new Object();
...
synchronized(lock) {
if (!myMap.containsKey(username))
myMap.put(username, 0);
myMap.put(username, myMap.get(username) + 1);
if (myMap.get(username) >= THRESHOLD) {
// possible brute force detected! alert admin / slow down login
}
}
The best way I see would be to store the fail counter with the user object, not in some kind of global map. That way the synchronization issue does not even turn up.
If you still want to go with the map you can get aways with a partially synchronized approach, if you use a mutable counter object:
static class FailCount {
public int count;
}
// increment counter for user
FailCount count;
synchronized (lock) {
count = theMap.get(user);
if (count == null) {
count = new FailCount();
theMap.put(user, count);
}
}
count.count++;
But most likely any optimization attempt here is a waste of time. Its not like your system will process millions of login failures a second, so your orginal code should do just fine.
The simplest solution I see here is to extract this code into a separate function and make in synchronized. (or put all your code into a synchronized block). All other left unchanged. Map variable should be made final.
Using a synchronized HashMap or a ConcurrentHashMap is only necessary if your monitoring application is multi-threaded. If that is the case, ConcurrentHashMap has significantly better performance under cases of high load/contention.
I would not dare use an unsynchronized structure with multiple threads if even one writer/updater thread exists. It's not just a matter of losing a few increments - the internal structure of the HashMap itself could be corrupted.
That said, if you want to ensure that no increment is lost, then even a synchronized Map is not enough:
UserX attempts to login
Thread A gets count N for "UserX"
UserX attempts to login again
Thread B gets count N for "UserX"
A puts N + 1 to the map
B puts N + 1 to the map
The map now contains N + 1 instead of N + 2
To avoid this, either use a synchronized block for the whole get/set operation or use something along the lines of an AtomicInterer instead of a plain Integer for your counter.
Related
I am trying to support concurrency on a hashmap that gets periodically cleared. I have a cache that stores data for a period of time. After every 5 minutes, the data in this cache is sent to the server. Once I flush, I want to clear the cache. The problem is when I am flushing, data could potentially be written to this map while I am doing that with an existing key. How would I go about making this process thread safe?
data class A(val a: AtomicLong, val b: AtomicLong) {
fun changeA() {
a.incrementAndGet()
}
}
class Flusher {
private val cache: Map<String, A> = ConcurrentHashMap()
private val lock = Any()
fun retrieveA(key: String){
synchronized(lock) {
return cache.getOrPut(key) { A(key, 1) }
}
}
fun flush() {
synchronized(lock) {
// send data to network request
cache.clear()
}
}
}
// Existence of multiple classes like CacheChanger
class CacheChanger{
fun incrementData(){
flusher.retrieveA("x").changeA()
}
}
I am worried that the above cache is not properly synchronized. Are there better/right ways to lock this cache so that I don't lose out on data? Should I create a deepcopy of cache and clear it?
Since the above data could be being changed by another changer, could that not lead to problems?
You can get rid of the lock.
In the flush method, instead of reading the entire map (e.g. through an iterator) and then clearing it, remove each element one by one.
I'm not sure if you can use iterator's remove method (I'll check that in a moment), but you can take the keyset iterate over it, and for each key invoke cache.remove() - this will give you the value stored and remove it from the cache atomically.
The tricky part is how to make sure that the object of class A won't be modified just prior sending over network... You can do it as follows:
When you get the some x through retrieveA and modify the object, you need to make sure it is still in the cache. Simply invoke retrieve one more time. If you get exactly the same object it's fine. If it's different, then it means that object was removed and sent over network, but you don't know if the modification was also sent, or the state of the object prior to the modification was sent. Still, I think in your case, you can simply repeat the whole process (apply change and check if objects are the same). But it depends on the specifics of your application.
If you don't want to increment twice, then when sending the data over network, you'll have to read the content of the counter a, store it in some local variable and decrease a by that amount (usually it will get zero). Then in the CacheChanger, when you get a different object from the second retrieve, you can check if the value is zero (your modification was taken into account), or non-zero which means your modification came just a fraction of second to late, and you'll have to repeat the process.
You could also replace incrementAndGet with compareAndSwap, but this could yield slightly worse performance. In this approach, instead of incrementing, you try to swap a value that is greater by one. And before sending over network you try to swap the value to -1 to denote the value as invalid. If the second swap fails it means that someone has changed the value concurrently, you need to check it one more time in order to send the freshest value over network, and you repeat the process in a loop (breaking the loop only once the swap to -1 succeeds). In the case of swap to greater by one, you also repeat the process in a loop until the swap succeeds. If it fails, it either means that somebody else swapped to some greater value, or the Flusher swapped to -1. In the latter case you know that you have to call retrieveA one more time to get a new object.
The easiest solution (but with a worse performance) is to rely completely on locks.
You can change ConcurrentHashMap to a regular HashMap.
Then you have to apply all your changes directly in the function retrieve:
fun retrieveA(key: String, mod: (A) -> Unit): A {
synchronized(lock) {
val obj: A = cache.getOrPut(key) { A(key, 1) }
mod(obj)
cache.put(obj)
return obj
}
}
I hope it compiles (I'm not an expert on Kotlin).
Then you use it like:
class CacheChanger {
fun incrementData() {
flusher.retrieveA("x") { it.changeA() }
}
}
Ok, I admit this code is not really Kotlin ;) you should use a Kotlin lambda instead of the Consumer interface. It's been some time since I played a bit with Kotlin. If someone could fix it I'd be very grateful.
This is just a concept :
I want to have a int be updated, by a function that is running inside multiple threads.
My function randomly chooses an int and checks if it's bigger than the existing int.
If so it updates the variable.
How do I ensure my values never clash or my function is looking at an older value not yet updated?
I'm new to threads so any information is valuable.
This a job for AtomicInteger!
AtomicInteger atomic;
...
int newValue = atomic.accumulateAndGet​(rand, Math::max);
How do I ensure my values never clash?
If you are new to threads, then I would say, use a synchronized block:
final Object lock = new Object();
int global_r = 0;
...
while (true) {
int r = random.nextInt();
synchronized(lock) {
if (r > global_r) {
global_r = r;
break;
}
}
}
Only one thread at a time can enter the synchronized(lock)... block. A thread that tries when another thread already is in it will be forced to wait.
One should always try to have threads spend as little time as possible inside a synchronized(...) block, which is why I generate the random number outside of the block, and only do the test and the assignment inside it.
Another answer here suggests that you use AtomicInteger. That works too, but it is not what I would recommend for somebody who is taking their very first steps toward understanding how to use threads.
What I am trying to do is to implement a key-specific read-write lock. Multiple read requests can be executed concurrently if there is no write request on that key. Put requests on different keys can be executed concurrently.
I used a ConcurrentHashMap to save the key and keep a record of running write operations for each key.
My code looks like this:
ConcurrentHashMap<String, AtomicInteger> count;
...
...
public void getLock(){
synchronized (count.get(key)) {
while (count.get(key).get() != 0) { // this means there are GET
requests running
try {
count.get(key).wait();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
The idea is that when a new thread wants to read, it needs to first check if there's any write on that key (if the count is not 0) and if not, it can go ahead, if yes, it needs to wait.
So I suppose I have to use count.get(key).wait();. However, Java forces me to synchronized (count.get(key)) in order to use the wait() method.
I wonder does it make sense to use the synchronization here since I already use AtomicInteger?
p.s.
I do have notify() later in the unlock method.
I just realized why I still need a synchronized block for AtomicInteger.
All comments as well as this link is pretty useful.
If the waiter didn't synchronize, then any old bit of code might
change the predicate just before it goes to sleep, and then we're
certainly in trouble.
So even it's AtomicInteger (actually, the datatype of the value doesn't really matter), just before it goes to wait, another thread can change its value and it would be wrong.
I am making an application that takes a bunch of journal entries and calculate sum.
Is below way of doing it is thread/concurrency safe when there are multiple threads calling the addToSum() method. I want to ensure that each call updates the total properly.
If it is not safe, please explain what do I have to do to ensure thread safety.
Do I need to synchronize the get/put or is there a better way?
private ConcurrentHashMap<String, BigDecimal> sumByAccount;
public void addToSum(String account, BigDecimal amount){
BigDecimal newSum = sumByAccount.get(account).add(amount);
sumByAccount.put(account, newSum);
}
Thanks so much!
Update:
Thanks everyone for the answer, I already get that the code above is not thread-safe.
Thanks Vint for suggesting the AtomicReference as an alternative to synchronize. I was using AtomicInteger to hold integer sums before and I was wondering if there are something like that for BigDecimal.
Is the a definitive conclusion on the pro and con of the two?
You can use synchronized like the others suggested but if want a minimally blocking solution you can try AtomicReference as a store for the BigDecimal
ConcurrentHashMap<String,AtomicReference<BigDecimal>> map;
public void addToSum(String account, BigDecimal amount) {
AtomicReference<BigDecimal> newSum = map.get(account);
for (;;) {
BigDecimal oldVal = newSum.get();
if (newSum.compareAndSet(oldVal, oldVal.add(amount)))
return;
}
}
Edit - I'll explain this more:
An AtomicReference uses CAS to atomically assigns a single reference. The loop says this.
If the current field stored in AtomicReference == oldVal [their location in memory, not their value] then replace the value of the field stored in AtomicReference with oldVal.add(amount). Now, any time after the for-loop you invoke newSum.get() it will have the BigDecimal object that has been added to.
You want to use a loop here because it is possible two threads are trying to add to the same AtomicReference. It can happen that one thread succeeds and another thread fails, if that happens just try again with the new added value.
With moderate thread contention this would be a faster implementation, with high contention you are better off using synchronized
Your solution is not thread safe. The reason is that it is possible for a sum to be missed since the operation to put is separate from the operation to get (so the new value you are putting into the map could miss a sum that is being added at the same time).
The safest way to do what you want to do is to synchronize your method.
That is not safe, because threads A and B might both call sumByAccount.get(account) at the same time (more or less), so neither one will see the result of the other's add(amount). That is, things might happen in this sequence:
thread A calls sumByAccount.get("accountX") and gets (for example) 10.0.
thread B calls sumByAccount.get("accountX") and gets the same value that thread A did: 10.0.
thread A sets its newSum to (say) 10.0 + 2.0 = 12.0.
thread B sets its newSum to (say) 10.0 + 5.0 = 15.0.
thread A calls sumByAccount.put("accountX", 12.0).
thread B calls sumByAccount.put("accountX", 15.0), overwriting what thread A did.
One way to fix this is to put synchronized on your addToSum method, or to wrap its contents in synchronized(this) or synchronized(sumByAccount). Another way, since the above sequence of events only happens if two threads are updating the same account at the same time, might be to synchronize externally based on some sort of Account object. Without seeing the rest of your program logic, I can't be sure.
Yes, you need to synchronize since otherwise you can have two threads each getting the same value (for the same key), say A and thread 1 add B to it and thread 2 adds C to it and store it back. The result now will not be A+B+C, but A+B or A+C.
What you need to do is lock on something that is common to the additions. Synchronizing on get/put will not help, unless you do
synchronize {
get
add
put
}
but if you do that then you will prevent threads from updating values even if it is for different keys. You want to synchronize on the account. However, synchronizing on the string seems unsafe as it could lead to deadlocks (you don't know what else locks the string). Can you create an account object instead and use that for locking?
I am planning to use this schema in my application, but I was not sure whether this is safe.
To give a little background, a bunch of servers will compute results of sub-tasks that belong to a single task and report them back to the central server. This piece of code is used to register the results, and also check whether all the subtasks for the task has completed and if so, report that fact only once.
The important point is that, all task must be reported once and only once as soon as it is completed (all subTaskResults are set).
Can anybody help? Thank you! (Also, if you have a better idea to solve this problem, please let me know!)
*Note that I simplified the code for brevity.
Solution I
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
Semaphore permission = new Semaphore(1);
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
return check() ? this : null;
}
private boolean check(){
for(AtomicReference ref : subtasks){
if(ref.get()==null){
return false;
}
}//for
return permission.tryAquire();
}
}//class
Stephen C kindly suggested to use a counter. Actually, I have considered that once, but I reasoned that the JVM could reorder the operations and thus, a thread can observe a decremented counter (by another thread) before the result is set in AtomicReference (by that other thread).
*EDIT: I now see this is thread safe. I'll go with this solution. Thanks, Stephen!
Solution II
class Task {
//Populate with bunch of (Long, new AtomicReference()) pairs
//Actual app uses read only HashMap
Map<Id, AtomicReference<SubTaskResult>> subtasks = populatedMap();
AtomicInteger counter = new AtomicInteger(subtasks.size());
public Task set(id, subTaskResult){
//null check omitted
subtasks.get(id).set(result);
//In the actual app, if !compareAndSet(null, result) return null;
return check() ? this : null;
}
private boolean check(){
return counter.decrementAndGet() == 0;
}
}//class
I assume that your use-case is that there are multiple multiple threads calling set, but for any given value of id, the set method will be called once only. I'm also assuming that populateMap creates the entries for all used id values, and that subtasks and permission are really private.
If so, I think that the code is thread-safe.
Each thread should see the initialized state of the subtasks Map, complete with all keys and all AtomicReference references. This state never changes, so subtasks.get(id) will always give the right reference. The set(result) call operates on an AtomicReference, so the subsequent get() method calls in check() will give the most up-to-date values ... in all threads. Any potential races with multiple threads calling check seem to sort themselves out.
However, this is a rather complicated solution. A simpler solution would be to use an concurrent counter; e.g. replace the Semaphore with an AtomicInteger and use decrementAndGet instead of repeatedly scanning the subtasks map in check.
In response to this comment in the updated solution:
Actually, I have considered that once,
but I reasoned that the JVM could
reorder the operations and thus, a
thread can observe a decremented
counter (by another thread) before the
result is set in AtomicReference (by
that other thread).
The AtomicInteger and AtomicReference by definition are atomic. Any thread that tries to access one is guaranteed to see the "current" value at the time of the access.
In this particular case, each thread calls set on the relevant AtomicReference before it calls decrementAndGet on the AtomicInteger. This cannot be reordered. Actions performed by a thread are performed in order. And since these are atomic actions, the efects will be visible to other threads in order as well.
In other words, it should be thread-safe ... AFAIK.
The atomicity guaranteed (per class documentation) explicitly for AtomicReference.compareAndSet extends to set and get methods (per package documentation), so in that regard your code appears to be thread-safe.
I am not sure, however, why you have Semaphore.tryAquire as a side-effect there, but without complimentary code to release the semaphore, that part of your code looks wrong.
The second solution does provide a thread-safe latch, but it's vulnerable to calls to set() that provide an ID that's not in the map -- which would trigger a NullPointerException -- or more than one call to set() with the same ID. The latter would mistakenly decrement the counter too many times and falsely report completion when there are presumably other subtasks IDs for which no result has been submitted. My criticism isn't with regard to the thread safety, but rather to the invariant maintenance; the same flaw would be present even without the thread-related concern.
Another way to solve this problem is with AbstractQueuedSynchronizer, but it's somewhat gratuitous: you can implement a stripped-down counting semaphore, where each call set() would call releaseShared(), decrementing the counter via a spin on compareAndSetState(), and tryAcquireShared() would only succeed when the count is zero. That's more or less what you implemented above with the AtomicInteger, but you'd be reusing a facility that offers more capabilities you can use for other portions of your design.
To flesh out the AbstractQueuedSynchronizer-based solution requires adding one more operation to justify the complexity: being able to wait on the results from all the subtasks to come back, such that the entire task is complete. That's Task#awaitCompletion() and Task#awaitCompletion(long, TimeUnit) in the code below.
Again, it's possibly overkill, but I'll share it for the purpose of discussion.
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
final class Task
{
private static final class Sync extends AbstractQueuedSynchronizer
{
public Sync(int count)
{
setState(count);
}
#Override
protected int tryAcquireShared(int ignored)
{
return 0 == getState() ? 1 : -1;
}
#Override
protected boolean tryReleaseShared(int ignored)
{
int current;
do
{
current = getState();
if (0 == current)
return true;
}
while (!compareAndSetState(current, current - 1));
return 1 == current;
}
}
public Task(int count)
{
if (count < 0)
throw new IllegalArgumentException();
sync_ = new Sync(count);
}
public boolean set(int id, Object result)
{
// Ensure that "id" refers to an incomplete task. Doing so requires
// additional synchronization over the structure mapping subtask
// identifiers to results.
// Store result somehow.
return sync_.releaseShared(1);
}
public void awaitCompletion()
throws InterruptedException
{
sync_.acquireSharedInterruptibly(0);
}
public void awaitCompletion(long time, TimeUnit unit)
throws InterruptedException
{
sync_.tryAcquireSharedNanos(0, unit.toNanos(time));
}
private final Sync sync_;
}
I have a weird feeling reading your example program, but it depends on the larger structure of your program what to do about that. A set function that also checks for completion is almost a code smell. :-) Just a few ideas.
If you have synchronous communication with your servers you might use an ExecutorService with the same number of threads like the number of servers that do the communication. From this you get a bunch of Futures, and you can naturally proceed with your calculation - the get calls will block at the moment the result is needed but not yet there.
If you have asynchronous communication with the servers you might also use a CountDownLatch after submitting the task to the servers. The await call blocks the main thread until the completion of all subtasks, and other threads can receive the results and call countdown on each received result.
With all these methods you don't need special threadsafety measures other than that the concurrent storing of the results in your structure is threadsafe. And I bet there are even better patterns for this.