Why is the following block not thread safe?

Why is the following block not thread safe? - java

We have a Spring Boot application where we fetch data from a database and keep it in a concurrent Hashmap tempMap. Then we use it to fetch details based on the incoming keys. But keys are in the form of arrays. Now, once loaded we don't refresh the map until the scheduler hits. But we can see empty optional being returned but the map was never refreshed. The method is being accessed by many threads in the application. Seems some kind of memory leak is there. Please, anyone can help to point what might be the issue.
This is the function body accessed by many threads that throws No value present exception:
if (sender.length == 1) {
return getSingle(sender[0]).get();
} else {
return getMultiple(sender).get();
}
These are the implementations for functions:
private Optional<DummyClassEntity> getSingle(String value) {
Optional<DummyClassEntity> dummyClass;
dummyClass = Optional.of(tempMap.get(value));
return dummyClass;
}
private Optional<DummyClassEntity> getMultiple(String[] values) {
return Arrays.stream(values)
.filter(value -> tempMap.containsKey(value))
.findFirst()
.map(s -> tempMap.get(s));
}
I tried using synchronized block. But that still impacts the performance. And this part is being used for an high performance application.

Related

Creating a loop using spring-webflux and avoid memory issues

I am currently working on a project where I need to create a loop using spring webflux to generate a Flux for downstream processing. The loop should sequentially take batches of elements from a source (in this instance a repository) and pass the elements as signal in a Flux. To acquire the elements, we have a repository method which fetches the next batch. When all elements have been processed, the method yields an empty List.
I have identified that I can use Flux::generate in the following manner:
Flux.<List<Object>>generate(sink -> {
List<Object> batch = repository.fetch();
if (batch.isEmpty()) {
sink.complete();
} else {
sink.next(batch);
}
})
...
However, when I use this, the argument method runs continuously, buffering until I run out of memory.
I have also tried using Flux::create, but I am struggling to find an appropriate approach. I have found that I can do the following:
Consumer<Integer> sinker;
Flux<?> createFlux() {
return Flux.<List<Object>>create(sink -> sinker = integer -> {
List<Object> batch = repository.fetch();
if (batch.isEmpty()) {
sink.complete();
} else {
sink.next(batch);
}
})
...
.doOnNext(x -> sinker.accept(y))
...
}
Then I just need to call the Consumer initially to initiate the loop.
However, I feel like I am overly complicating a job which should have a fairly standard implementation. Also, this implementation requires secondary calls to get started, and I haven't found a decent way to initiate it within the pipeline (for instance, using .onSubscribe() doesn't work, as it attempts to call the Consumer before it has been assigned).
So in summary, I am looking for a simple way to create an unbounded loop while controlling the backpressure to avoid outOfMemory-errors.

I believe I have found a simpler solution which serves my need. The method Mono::repeat(BooleanSuplier) allows me to loop until the list is empty, simply by:
Mono.fromCallable(() -> repository.nextBatch())
.flatMap(/* do some stuff here */)
.repeat(() -> repository.hasNext())
If other more elegant solutions exist, I am still open for suggestions.

Is there a way to successfully execute nested flux operations without actually blocking your code?

While working with Spring Webflux, I'm trying to insert some data in the realm object server which interacts with Java apps via a Rest API. So basically I have a set of students, who have a set of subjects and my objective is to persist those subjects in a non-blocking manner. So I use a microservice exposed via a rest endpoint which provides me with a Flux of student roll numbers, and for that flux, I use another microservice exposed via a rest endpoint that gets me the Flux of subjects, and for each of these subjects, I want to persist them in the realm server via another rest endpoint. I wanted to make this all very nonblocking which is why I wanted my code to look like this.
void foo() {
studentService.getAllRollnumbers().flatMap(rollnumber -> {
return subjectDirectory.getAllSubjects().map(subject -> {
return dbService.addSubject(subject);
})
});
}
But this doesn't work for some reason. But once I call blocks on the things, they get into place, something like this.
Flux<Done> foo() {
List<Integer> rollNumbers = studentService.getAllRollnumbers().collectList().block();
rollNumbers.forEach(rollNumber -> {
List<Subject> subjects = subjectDirectory.getAllSubjects().collectList().block();
subjects.forEach(subject -> {dbService.addSubject(subject).block();});
});
return Flux.just(new NotUsed());
}
getAllRollnumbers() returns a flux of integers.
getAllSubjects() returns a flux of subject.
and addSubject() returns a Mono of DBResponse pojo.
What I can understand is that the thread executing this function is getting expired before much of it gets triggerred. Please help me work this code in an async non blocking manner.

You are not subscribing to the Publisher at all in the first instance that is why it is not executing. You can do this:
studentService.getAllRollnumbers().flatMap(rollnumber -> {
return subjectDirectory.getAllSubjects().map(subject -> {
return dbService.addSubject(subject);
})
}).subscribe();
However it is usually better to let the framework take care of the subscription, but without seeing the rest of the code I can't advise.

Best way to sequence a pair of external service calls in Akka

I need to geocode an Address object, and then store the updated Address in a search engine. This can be simplified to taking an object, performing one long-running operation on the object, and then persisting the object. This means there is an order of operations requirement that the first operation be complete before persistence occurs.
I would like to use Akka to move this off the main thread of execution.
My initial thought was to use a pair of Futures to accomplish this, but the Futures documentation is not entirely clear on which behavior (fold, map, etc) guarantees one Future to be executed before another.
I started out by creating two functions, defferedGeocode and deferredWriteToSearchEngine which return Futures for the respective operations. I chain them together using Future<>.andThen(new OnComplete...), but this gets clunky very quickly:
Future<Address> geocodeFuture = defferedGeocode(ec, address);
geocodeFuture.andThen(new OnComplete<Address>() {
public void onComplete(Throwable failure, Address geocodedAddress) {
if (geocodedAddress != null) {
Future<Address> searchEngineFuture = deferredWriteToSearchEngine(ec, addressSearchService, geocodedAddress);
searchEngineFuture.andThen(new OnComplete<Address>() {
public void onComplete(Throwable failure, Address savedAddress) {
// process search engine results
}
});
}
}
}, ec);
And then deferredGeocode is implemented like this:
private Future<Address> defferedGeocode(
final ExecutionContext ec,
final Address address) {
return Futures.future(new Callable<Address>() {
public Address call() throws Exception {
log.debug("Geocoding Address...");
return address;
}
}, ec);
};
deferredWriteToSearchEngine is pretty similar to deferredGeocode, except it takes the search engine service as an additional final parameter.
My understand is that Futures are supposed to be used to perform calculations and should not have side effects. In this case, geocoding the address is calculation, so I think using a Future is reasonable, but writing to the search engine is definitely a side effect.
What is the best practice here for Akka? How can I avoid all the nested calls, but ensure that both the geocoding and the search engine write are done off the main thread?
Is there a more appropriate tool?
Update:
Based on Viktor's comments below, I am trying this code out now:
ExecutionContext ec;
private Future<Address> addressBackgroundProcess(Address address) {
Future<Address> geocodeFuture = addressGeocodeFutureFactory.defferedGeocode(address);
return geocodeFuture.flatMap(new Mapper<Address, Future<Address>>() {
#Override
public Future<Address> apply(Address geoAddress) {
return addressSearchEngineFutureFactory.deferredWriteToSearchEngine(geoAddress);
}
}, ec);
}
This seems to work ok except for one issue which I'm not thrilled with. We are working in a Spring IOC code base, and so I would like to inject the ExecutionContext into the FutureFactory objects, but it seems wrong for this function (in our DAO) to need to be aware of the ExecutionContext.
It seems odd to me that the flatMap() function needs an EC at all, since both futures provide one.
Is there a way to maintain the separation of concerns? Am I structuring the code badly, or is this just the way it needs to be?
I thought about creating an interface in the FutureFactory's that would allow chaining of FutureFactory's, so the flatMap() call would be encapsulated in a FutureFactory base class, but this seems like it would be deliberately subverting an intentional Akka design decision.

Warning: Pseudocode ahead.
Future<Address> myFutureResult = deferredGeocode(ec, address).flatMap(
new Mapper<Address, Future<Address>>() {
public Future<Address> apply(Address geocodedAddress) {
return deferredWriteToSearchEngine(ec, addressSearchService, geocodedAddress);
}
}, ec).map(
new Mapper<Address, SomeResult>() {
public SomeResult apply(Address savedAddress) {
// Create SomeResult after deferredWriteToSearchEngine is done
}
}, ec);
See how it is not nested. flatMap and map is used for sequencing the operations. "andThen" is useful for when you want a side-effecting-only operation to run to full completion before passing the result on. Of course, if you map twice on the SAME future-instance then there is no ordering guaranteed, but since we are flatMapping and mapping on the returned futures (new ones according to the docs), there is a clear data-flow in our program.

spring cache, TTL unles service is down

I have an interesting task where I need to cache the results of my method, which is really simple with spring cache abstraction
#Cachable(...)
public String getValue(String key){
return restService.getValue(key);
}
The restService.getValue() targets a REST service, which can be answering or not if the end point is down.
I need to set a specific TTL for the cache value, lets say 5 minutes, but in case if the server is down I need to return the last value, even if it extends 5 minutes.
I was thinking about having a second cachable method which have no TTL and always returns the last value, it would be called from getValue if restService returns nothing, but maybe there is a better way?

I've been interested in doing this for a while too. Sorry to say, I have not found any trivial way of doing this. Spring will not do this for you, it's more a question of whether what cache implementation spring is wrapping can do it. I assume you are using the EhCache implementation. Unfortunately this functionality does not come out the box as far as I know.
There are various ways one can achieve something similar depending on your problem
1) use an eternal cache time and have a second class Thread which periodically loops over the cached data refreshing it. I have not done this exactly, but the Thread class would need to have to look something like this:
#Autowired
EhCacheCacheManager ehCacheCacheManager;
...
//in the infinite loop
List keys = ((Ehcache) ehCacheCacheManager.getCache("test").getNative Cache()).getKeys();
for (int i = 0; i < keys.size(); i++) {
Object o = keys.get(i);
Ehcache ehcache = (Ehcache)ehCacheCacheManager.getCache("test").getNativeCache()
Element item = (ehcache).get(o);
//get the data based on some info in the value, and if no exceptions
ehcache.put(new Element(element.getKey(), newValue));
}
benefits are this is very fast for the #Cacheable caller, downside is your server might get more hits than neccessary
2) You could make a CacheListener to listen to the eviction event, store the data temporarily. And should the server call fail, use that data and return from the method.
the ehcache.xml
<cacheEventListenerFactory class="caching.MyCacheEventListenerFactory"/>
</cache>
</ehcache>
The factory:
import net.sf.ehcache.event.CacheEventListener;
import net.sf.ehcache.event.CacheEventListenerFactory;
import java.util.Properties;
public class MyCacheEventListenerFactory extends CacheEventListenerFactory {
#Override
public CacheEventListener createCacheEventListener(Properties properties) {
return new CacheListener();
}
}
The Pseudo-implementation
import net.sf.ehcache.CacheException;
import net.sf.ehcache.Ehcache;
import net.sf.ehcache.Element;
import net.sf.ehcache.event.CacheEventListener;
import java.util.concurrent.ConcurrentHashMap;
public class CacheListener implements CacheEventListener {
//prob bad practice to use a global static here - but its just for demo purposes
public static ConcurrentHashMap myMap = new ConcurrentHashMap();
#Override
public void notifyElementPut(Ehcache ehcache, Element element) throws CacheException {
//we can remove it since the put happens after a method return
myMap.remove(element.getKey());
}
#Override
public void notifyElementExpired(Ehcache ehcache, Element element) {
//expired item, we should store this
myMap.put(element.getKey(), element.getValue());
}
//....
}
A challenge here is that the key is not very useful, you might need to store something about the key in the returned value to be able to pick it up if the server call fails. This feels a bit hacky, and I have not determined if this is exactly bullet proof. It might need some testing.
3) A lot of effort but works:
#Cacheable("test")
public MyObject getValue(String data) {
try {
MyObject result = callServer(data);
storeResultSomewhereLikeADatabase(result);
} catch (Exception ex) {
return getStoredResult(data);
}
}
a Pro here is that it will work between server restarts, and you can extend it simply to allow shared caches between clustered servers.
I had a version in an 12 clustered environment where each one checked the database first to see if any other cluster had got the "expensive" data first
and then reused that rather than make the server call.
A slight variant would also be to use a second #Cacheable method together with #CachePut rather than a DB to store the data. But this would mean doubling up in memory usage. That might be acceptable depending on your result sizes.

Maybe you can use spel to change the used cache (one using ttl and the second not) if the condition (is the service up?) is true or false, I've never used spel this way (I used it to change the key based on some request params) but I think it could work
#Cacheable(value = "T(com.xxx.ServiceChecker).checkService()",...)
where checkService() is a static method that returns the name of the cache that should be used

Pre-load values for a Guava Cache

I have a requirement where we are loading static data from a database for use in a Java application. Any caching mechanism should have the following functionality:
Load all static data from the database (once loaded, this data will not change)
Load new data from the database (data present in the database at start-up will not change but it is possible to add new data)
Lazy loading of all the data isn't an option as the application will be deployed to multiple geographical locations and will have to communicate with a single database. Lazy loading the data will make the first request for a specific element too slow where the application is in a different region to the database.
I have been using the MapMaker API in Guava with success but we are now upgrading to the latest release and I can't seem to find the same functionality in the CacheBuilder API; I can't seem to find a clean way of loading all data at start-up.
One way would be to load all keys from the database and load those through the Cache individually. This would work but would result in N+1 calls to the database, which isn't quite the efficient solution I'm looking for.
public void loadData(){
List<String> keys = getAllKeys();
for(String s : keys)
cache.get(s);
}
Or the other solution is to use a ConcurrentHashMap implementation and handle all of the threads and missing entries myself? I'm not keen on doing this as the MapMaker and CacheBuilder APIs provide the key-based thread locking for free without having to provide extra testing. I'm also pretty sure the MapMaker/CacheBuilder implementations will have some efficiencies that I don't know about/haven't got time to investigate.
public Element get(String key){
Lock lock = getObjectLock(key);
lock.lock();
try{
Element ret = map.get(key)
if(ret == null){
ret = getElement(key); // database call
map.put(key, e);
}
return ret;
}finally {
lock.unlock();
}
}
Can anyone think of a better solution to my two requirements?
Feature Request
I don't think pre-loading a cache is an uncommon requirement, so it would be nice if the CacheBuilder provided a configuration option to pre-load the cache. I think providing an Interface (much like CacheLoader) which will populate the cache at start-up would be an ideal solution, such as:
CacheBuilder.newBuilder().populate(new CachePopulator<String, Element>(){
#Override
public Map<String, Element> populate() throws Exception {
return getAllElements();
}
}).build(new CacheLoader<String, Element>(){
#Override
public Element load(String key) throws Exception {
return getElement(key);
}
});
This implementation would allow the Cache to be pre-populated with all relevant Element objects, whilst keeping the underlying CustomConcurrentHashMap non-visible to the outside world.

In the short-term I would just use Cache.asMap().putAll(Map<K, V>).
Once Guava 11.0 is released you can use Cache.getAll(Iterable<K>), which will issue a single bulk request for all absent elements.

I'd load all static data from the DB, and store it in the Cache using cache.asMap().put(key, value) ([Guava 10.0.1 allows write operations on the Cache.asMap() view][1]).
Of course, this static data might get evicted, if your cache is configured to evict entries...
The CachePopulator idea is interesting.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why is the following block not thread safe? - java

Related

Creating a loop using spring-webflux and avoid memory issues

Is there a way to successfully execute nested flux operations without actually blocking your code?

Best way to sequence a pair of external service calls in Akka

spring cache, TTL unles service is down

Pre-load values for a Guava Cache

Categories

Resources