Facing a Race Condition while implementing process function in flink
connected streams. I am having Cache Map that is being shared between two
functions processElement1 & processElement2 that is being called parallelly
by 2 different threads.
Streams1--->(sending offerdata)
Streams2--->(sending lms(loyality management system data))
connect=Streams1.connect(Streams2);
connect.process(new TriggerStream);
In TriggerStream Class I am storing the data using unique Id: MemberId as unique Key to Store & lookup data in cache. When the data is flowing in I am not getting consisted results
class LRUConcurrentCache<K,V>{
private final Map<K,V> cache;
private final int maxEntries;
public LRUConcurrentCache(final int maxEntries) {
this.cache = new LinkedHashMap<K,V>(maxEntries, 0.75F, true) {
private static final long serialVersionUID = -1236481390177598762L;
#Override
protected boolean removeEldestEntry(Map.Entry<K,V> eldest){
return size() > maxEntries;
}
};
}
//Why we cant lock on the key
public void put(K key, V value) {
synchronized(key) {
cache.put(key, value);
}
}
//get methode
public V get(K key) {
synchronized(key) {
return cache.get(key);
}
}
public class TriggerStream extends CoProcessFunction<IOffer, LMSData, String> {
private static final long serialVersionUID = 1L;
LRUCache cache;
private String offerNode;
String updatedValue, retrivedValue;
Subscriber subscriber;
TriggerStream(){
this.cache== new LRUCache(10);
}
#Override
public void processElement1(IOffer offer) throws Exception {
try {
ObjectMapper mapper = new ObjectMapper();
mapper.configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false);
mapper.enableDefaultTyping();
// TODO Auto-generated method stub
IOffer latestOffer = offer;
//Check the subscriber is there or not
retrivedValue = cache.get(latestOffer.getMemberId().toString());
if ((retrivedValue == null)) {
//Subscriber is the class that is used and converted as Json String & then store into map
Subscriber subscriber = new Subscriber();
subscriber.setMemberId(latestOffer.getMemberId());
ArrayList<IOffer> offerList = new ArrayList<IOffer>();
offerList.add(latestOffer);
subscriber.setOffers(offerList);
updatedValue = mapper.writeValueAsString(subscriber);
cache.set(subscriber.getMemberId().toString(), updatedValue);
} else {
Subscriber subscriber = mapper.readValue(retrivedValue, Subscriber.class);
List<IOffer> offers = subscriber.getOffers();
offers.add(latestOffer);
updatedValue= mapper.writeValueAsString(subscriber);
cache.set(subscriber.getMemberId().toString(), subscriberUpdatedValue);
}
} catch (Exception pb) {
applicationlogger.error("Exception in Offer Loading:"+pb);
applicationlogger.debug("*************************FINISHED OFFER LOADING*******************************");
}
applicationlogger.debug("*************************FINISHED OFFER LOADING*******************************");
}
#Override
public void processElement2(LMSData lms) throws Exception {
try {
ObjectMapper mapper = new ObjectMapper();
mapper.configure(SerializationFeature.FAIL_ON_EMPTY_BEANS, false);
mapper.enableDefaultTyping();
// TODO Auto-generated method stub
//Check the subscriber is there or not
retrivedValue = cache.get(lms.getMemberId().toString());
if(retrivedValue !=null){
Subscriber subscriber = mapper.readValue(retrivedValue, Subscriber.class);
//do some calculations
String updatedValue = mapper.writeValueAsString(subscriber);
//Update value
cache.set(subscriber.getMemberId().toString(), updatedValue);
}
} catch (Exception pb) {
applicationlogger.error("Exception in Offer Loading:"+pb);
applicationlogger.debug("*************************FINISHED OFFER LOADING*******************************");
}
applicationlogger.debug("*************************FINISHED OFFER LOADING*******************************");
}
}
Flink does not give guarantees in which order a CoProcessFunction (or any other Co*Function) ingests the data. Maintaining some kind of deterministic order across distributed, parallel tasks would be too expensive.
Instead, you have to work around that in your function with state and possibly timers. The LRUCache in your function should be maintained as state (probably keyed state). Otherwise, it will be lost in case of a failure. You can add another state for the first stream and buffer records until the lookup value from the second stream has arrived.
Related
I have a question on the use of IO operations within java.util.function.Predicate. Please consider the following example:
public class ClientGroupFilter implements Predicate<Client> {
private GroupMapper mapper;
private List<String> validGroupNames = new ArrayList<>();
public ClientGroupFilter(GroupMapper mapper) {
this.mapper = mapper;
}
#Override
public boolean test(Client client) {
// this is a database call
Set<Integer> validsIds = mapper.getValidIdsForGroupNames(validGroupNames);
return client.getGroupIds().stream().anyMatch(validIds::contains);
}
public void permit(String name) {
validGroupNames.add(name);
}
}
As you can see this filter accepts any number of server group names, which are resolved by the mapper when a specific client is tested. If the client owns one of the valid server groups, true is returned.
Now, of course it is obivous that this is totally iniffecient if the filter is applied to multiple clients. So, refactoring lead me to this:
public class ClientGroupFilter implements Predicate<Client> {
private GroupMapper mapper;
private List<String> validGroupNames = new ArrayList<>();
private boolean updateRequired = true;
private Set<Integer> validIds = new HashSet<>();
public ClientGroupFilter(GroupMapper mapper) {
this.mapper = mapper;
}
#Override
public boolean test(Client client) {
if(updateRequired) {
// this is a database call
validIds = mapper.getValidIdsForGroupNames(validGroupNames);
updateRequired = false;
}
return client.getGroupIds().stream().anyMatch(validIds::contains);
}
public void permit(String name) {
validGroupNames.add(name);
updateRequired = true;
}
}
The performance is a lot better, of course, but im still not happy with the solution, since i feel like java.util.function.Predicate should not be used like this. However, i still want to be able to provide a fast solution to filter a list of clients, without the need to require the consumer to map the server group name to its ids.
Does anyone have a better idea to refactor this?
If your usage pattern is such that you call permit several times, and then use Predicate<Client> without calling permit again, you can separate the code that collects validGroupNames from the code of your predicate by using a builder:
class ClientGroupFilterBuilder {
private final GroupMapper mapper;
private List<String> validGroupNames = new ArrayList<>();
public ClientGroupFilter(GroupMapper mapper) {
this.mapper = mapper;
}
public void permit(String name) {
validGroupNames.add(name);
}
public Predicate<Client> build() {
final Set<Integer> validIds = mapper.getValidIdsForGroupNames(validGroupNames);
return new Predicate<Client>() {
#Override
public boolean test(Client client) {
return client.getGroupIds().stream().anyMatch(validIds::contains);
}
}
}
}
This restricts building of validIds to the point where we construct the Predicate<Client>. Once the predicate is constructed, no further input is necessary.
i want to get the total memory cost of Distributed Map in Hazelcast.
i have tried below ,
LocalMapStats mapStatistics = cache.getLocalMapStats();
this.heapCost=mapStatistics.getHeapCost();
This gives cost of Map from Local Node only.
Can anyone help me here,to get the total memory cost of the map across all nodes in hazelcast.
AS per below comment i have tried ExecutorService,
My Callable class is,
public class DistrubutedMapStats implements Callable<String>, Serializable{
/**
*
*/
private static final long serialVersionUID = 1L;
String cacheMapName = null;
// 0 means no heap Cost.
/** The heap Cost. */
protected long heapCost = 0;
private String instanceName;
transient HazelcastInstance hazelcastInstance;
public DistrubutedMapStats() {
}
public DistrubutedMapStats(String cacheMapName,String instanceName) {
this.cacheMapName = cacheMapName;
this.instanceName=instanceName;
hazelcastInstance=Hazelcast.getHazelcastInstanceByName(instanceName);
}
public String call() {
System.out.println("HazelcastInstance Details="+hazelcastInstance.getName());
LocalMapStats mapStatistics = hazelcastInstance.getMap(cacheMapName).getLocalMapStats();
heapCost = mapStatistics.getHeapCost();
System.out.println("CacheName="+cacheMapName+" HeapCost="+heapCost);
return ""+heapCost;
}
and calling method is,
private void getHeapCostFromMembers(String cacheName, Set<Member> members) throws Exception {
IExecutorService executorService = hazelcastInstance.getExecutorService("default");
DistrubutedMapStats distrubutedMapStats=new DistrubutedMapStats(cacheName,hazelcastInstance.getName());
Map<Member, Future<String>> futures = executorService.submitToMembers(distrubutedMapStats, members);
for (Future<String> future : futures.values()) {
String echoResult = future.get();
System.out.println("HEAP COST="+echoResult);
// ...
}
}
but getting below error while running,
java.util.concurrent.ExecutionException: java.lang.NullPointerException: while trying to invoke the method com.hazelcast.core.HazelcastInstance.getName() of a null object loaded from field DistrubutedMapStats.hazelcastInstance of an object loaded from local variable 'this'
You probably want to use an ExecutorService (hazelcastInstance::getExecutorService) to run the operation on all nodes and sum up the result, if that makes sense.
Callable class:
public class DistrubutedMapStats implements Callable<String>, Serializable,HazelcastInstanceAware{
private static final long serialVersionUID = 1L;
String cacheMapName = null;
// 0 means no heap Cost.
/** The heap Cost. */
protected long heapCost = 0;
protected long totalHeapCost = 0;
protected long backupHeapCost = 0;
public transient HazelcastInstance hazelcastInstance;
public DistrubutedMapStats() {
}
public DistrubutedMapStats(String cacheMapName) {
this.cacheMapName = cacheMapName;
}
public String call() {
LocalMapStats mapStatistics = hazelcastInstance.getMap(cacheMapName).getLocalMapStats();
heapCost = mapStatistics.getHeapCost();
backupHeapCost=mapStatistics.getBackupEntryMemoryCost();
totalHeapCost=heapCost-backupHeapCost;
System.out.println("CacheName="+cacheMapName+" Total Cost="+heapCost+" HeapCost="+totalHeapCost+" BackupHeapCost="+backupHeapCost+" from Member");
return ""+totalHeapCost;
}
#Override
public void setHazelcastInstance(HazelcastInstance hazelcastInstance) {
// TODO Auto-generated method stub
this.hazelcastInstance=hazelcastInstance;
}
calling method,
private long getHeapCostFromMembers(String cacheName, Set<Member> members) throws Exception {
long totalCacheHeapCost=0;
members=hazelcastInstance.getCluster().getMembers();
IExecutorService executorService = hazelcastInstance.getExecutorService("default");
DistrubutedMapStats distrubutedMapStats=new DistrubutedMapStats(cacheName);
distrubutedMapStats.setHazelcastInstance(hazelcastInstance);
System.out.println("Total Members in Cloud="+members.size());
Map<Member, Future<String>> futures = executorService.submitToMembers(distrubutedMapStats, members);
int i=0;
for (Future<String> future : futures.values())
{
i++;
String heapCostFromMembers = future.get();
System.out.println("HEAP COST "+"For Cache "+cacheName+" is"+" of Member="+i+" is "+heapCostFromMembers);
if(!heapCostFromMembers.isEmpty())
{
totalCacheHeapCost+=Long.parseLong(heapCostFromMembers);
}
// ...
}
System.out.println("Total HEAP COST "+"For Cache "+cacheName+" is"+" of Members="+members.size()+" is "+totalCacheHeapCost);
return totalCacheHeapCost;
}
I have the following set of classes (along with a failing unit test):
Sprocket:
public class Sprocket {
private int serialNumber;
public Sprocket(int serialNumber) {
this.serialNumber = serialNumber;
}
#Override
public String toString() {
return "sprocket number " + serialNumber;
}
}
SlowSprocketFactory:
public class SlowSprocketFactory {
private final AtomicInteger maxSerialNumber = new AtomicInteger();
public Sprocket createSprocket() {
// clang, click, whistle, pop and other expensive onomatopoeic operations
int serialNumber = maxSerialNumber.incrementAndGet();
return new Sprocket(serialNumber);
}
public int getMaxSerialNumber() {
return maxSerialNumber.get();
}
}
SprocketCache:
public class SprocketCache {
private SlowSprocketFactory sprocketFactory;
private Sprocket sprocket;
public SprocketCache(SlowSprocketFactory sprocketFactory) {
this.sprocketFactory = sprocketFactory;
}
public Sprocket get(Object key) {
if (sprocket == null) {
sprocket = sprocketFactory.createSprocket();
}
return sprocket;
}
}
TestSprocketCache unit test:
public class TestSprocketCache {
private SlowSprocketFactory sprocketFactory = new SlowSprocketFactory();
#Test
public void testCacheReturnsASprocket() {
SprocketCache cache = new SprocketCache(sprocketFactory);
Sprocket sprocket = cache.get("key");
assertNotNull(sprocket);
}
#Test
public void testCacheReturnsSameObjectForSameKey() {
SprocketCache cache = new SprocketCache(sprocketFactory);
Sprocket sprocket1 = cache.get("key");
Sprocket sprocket2 = cache.get("key");
assertEquals("cache should return the same object for the same key", sprocket1, sprocket2);
assertEquals("factory's create method should be called once only", 1, sprocketFactory.getMaxSerialNumber());
}
}
The TestSprocketCache unit test always returns a green bar even if I change the following as follows:
Sprocket sprocket1 = cache.get("key");
Sprocket sprocket2 = cache.get("pizza");
Am guessing that I have to use a HashMap.contains(key) inside SprocketCache.get() method but can't seem to figure the logic.
The problem you're having here is that your get(Object) implementation only allows one instance to be created:
public Sprocket get(Object key) {
// Creates object if it doesn't exist yet
if (sprocket == null) {
sprocket = sprocketFactory.createSprocket();
}
return sprocket;
}
This is a typical lazy-loading instantiation singleton pattern. If you invoke get again, an instance will be assigned to sprocket and it will skip the instantiation completely. Note that you don't even use the key parameter at all, so it does not affect anything.
Using a Map would indeed be one way to achieve your objective:
public class SprocketCache {
private SlowSprocketFactory sprocketFactory;
private Map<Object, Sprocket> instances = new HashMap<Object, Sprocket>();
public SprocketCache(SlowSprocketFactory sprocketFactory) {
this.sprocketFactory = sprocketFactory;
}
public Sprocket get(Object key) {
if (!instances.containsKey(key)) {
instances.put(sprocket);
}
return instances.get(key);
}
}
Well, your current Cache implementation does not rely on key, so no wonder it always returns same cached-once value.
If you want to store different values for keys, and assuming you want it to be thread safe, you might end up doing something like this:
public class SprocketCache {
private SlowSprocketFactory sprocketFactory;
private ConcurrentHashMap<Object, Sprocket> cache = new ConcurrentHashMap<?>();
public SprocketCache(SlowSprocketFactory sprocketFactory) {
this.sprocketFactory = sprocketFactory;
}
public Sprocket get(Object key) {
if (!cache.contains(key)) {
// we only wan't acquire lock for cache seed operation rather than for every get
synchronized (key){
// kind of double check locking to make sure no other thread has populated cache while we were waiting for monitor to be released
if (!cache.contains(key)){
cache.putIfAbsent(key, sprocketFactory.createSprocket());
}
}
}
return cache.get(key);
}
}
Couple important side notes:
you'll need CocncurrentHashMap to ensure happens-before paradigm and so other thread will instantly see if cache has been filled;
new cache value creation has to be synchronized so each concurrent
thread won't generate it's own value, overriding previous values during race condition;
synchronization is quite expensive so we only wan't to engage it when needed, and due to same race condition you might get several threads holding monitor at the same time. That is why another check is required AFTER synchronized block to make sure that other thread hasn't already filled that value.
I have a library which is being used by customer and they are passing DataRequest object which has userid, timeout and some other fields in it. Now I use this DataRequest object to make a URL and then I make an HTTP call using RestTemplate and my service returns back a JSON response which I use it to make a DataResponse object and return this DataResponse object back to them.
Below is my DataClient class used by customer by passing DataRequest object to it. I am using timeout value passed by customer in DataRequest to timeout the request if it is taking too much time in getSyncData method.
public class DataClient implements Client {
private RestTemplate restTemplate = new RestTemplate();
// first executor
private ExecutorService service = Executors.newFixedThreadPool(15);
#Override
public DataResponse getSyncData(DataRequest key) {
DataResponse response = null;
Future<DataResponse> responseFuture = null;
try {
responseFuture = getAsyncData(key);
response = responseFuture.get(key.getTimeout(), key.getTimeoutUnit());
} catch (TimeoutException ex) {
response = new DataResponse(DataErrorEnum.CLIENT_TIMEOUT, DataStatusEnum.ERROR);
responseFuture.cancel(true);
// logging exception here
}
return response;
}
#Override
public Future<DataResponse> getAsyncData(DataRequest key) {
DataFetcherTask task = new DataFetcherTask(key, restTemplate);
Future<DataResponse> future = service.submit(task);
return future;
}
}
DataFetcherTask class:
public class DataFetcherTask implements Callable<DataResponse> {
private DataRequest key;
private RestTemplate restTemplate;
public DataFetcherTask(DataRequest key, RestTemplate restTemplate) {
this.key = key;
this.restTemplate = restTemplate;
}
#Override
public DataResponse call() throws Exception {
// In a nutshell below is what I am doing here.
// 1. Make an url using DataRequest key.
// 2. And then execute the url RestTemplate.
// 3. Make a DataResponse object and return it.
// I am calling this whole logic in call method as LogicA
}
}
As of now my DataFetcherTask class is responsible for one DataRequest key as shown above..
Problem Statement:-
Now I have a small design change. Customer will pass DataRequest (for example keyA) object to my library and then I will make a new http call to another service (which I am not doing in my current design) by using user id present in DataRequest (keyA) object which will give me back list of user id's so I will use those user id's and make few other DataRequest (keyB, keyC, keyD) objects one for each user id returned in the response. And then I will have List<DataRequest> object which will have keyB, keyC and keyD DataRequest object. Max element in the List<DataRequest> will be three, that's all.
Now for each of those DataRequest object in List<DataRequest> I want to execute above DataFetcherTask.call method in parallel and then make List<DataResponse> by adding each DataResponse for each key. So I will have three parallel calls to DataFetcherTask.call. Idea behind this parallel call is to get the data for all those max three keys in the same global timeout value.
So my proposal is - DataFetcherTask class will return back List<DataResponse> object instead of DataResponse and then signature of getSyncData and getAsyncData method will change as well. So here is the algorithm:
Use DataRequest object passed by customer to make List<DataRequest> by calling another HTTP service.
Make a parallel call for each DataRequest in List<DataRequest> to DataFetcherTask.call method and return List<DataResponse> object to customer instead of DataResponse.
With this way, I can apply same global timeout on step 1 along with step 2 as well. If either of above step is taking time, we will just timeout in getSyncData method.
DataFetcherTask class after design change:
public class DataFetcherTask implements Callable<List<DataResponse>> {
private DataRequest key;
private RestTemplate restTemplate;
// second executor here
private ExecutorService executorService = Executors.newFixedThreadPool(10);
public DataFetcherTask(DataRequest key, RestTemplate restTemplate) {
this.key = key;
this.restTemplate = restTemplate;
}
#Override
public List<DataResponse> call() throws Exception {
List<DataRequest> keys = generateKeys();
CompletionService<DataResponse> comp = new ExecutorCompletionService<>(executorService);
int count = 0;
for (final DataRequest key : keys) {
comp.submit(new Callable<DataResponse>() {
#Override
public DataResponse call() throws Exception {
return performDataRequest(key);
}
});
}
List<DataResponse> responseList = new ArrayList<DataResponse>();
while (count-- > 0) {
Future<DataResponse> future = comp.take();
responseList.add(future.get());
}
return responseList;
}
// In this method I am making a HTTP call to another service
// and then I will make List<DataRequest> accordingly.
private List<DataRequest> generateKeys() {
List<DataRequest> keys = new ArrayList<>();
// use key object which is passed in contructor to make HTTP call to another service
// and then make List of DataRequest object and return keys.
return keys;
}
private DataResponse performDataRequest(DataRequest key) {
// This will have all LogicA code here which is shown in my original design.
// everything as it is same..
}
}
Now my question is -
Does it have to be like this? What is the right design to solve this problem? I mean having call method in another call method looks weird?
Do we need to have two executors like I have in my code? Is there any better way to solve this problem or any kind of simplification/design change we can do here?
I have simplified the code so that idea gets clear what I am trying to do..
As already mentioned in the comments of your question, you can use Java's ForkJoin framework. This will save you the extra thread pool within your DataFetcherTask.
You simply need to use a ForkJoinPool in your DataClient and convert your DataFetcherTask into a RecursiveTask (one of ForkJoinTask's subtypes). This allows you to easily execute other subtasks in parallel.
So, after these modifications your code will look something like this:
DataFetcherTask
The DataFetcherTask is now a RecursiveTask which first generates the keys and invokes subtasks for each generated key. These subtasks are executed in the same ForkJoinPool as the parent task.
public class DataFetcherTask extends RecursiveTask<List<DataResponse>> {
private final DataRequest key;
private final RestTemplate restTemplate;
public DataFetcherTask(DataRequest key, RestTemplate restTemplate) {
this.key = key;
this.restTemplate = restTemplate;
}
#Override
protected List<DataResponse> compute() {
// Create subtasks for the key and invoke them
List<DataRequestTask> requestTasks = requestTasks(generateKeys());
invokeAll(requestTasks);
// All tasks are finished if invokeAll() returns.
List<DataResponse> responseList = new ArrayList<>(requestTasks.size());
for (DataRequestTask task : requestTasks) {
try {
responseList.add(task.get());
} catch (InterruptedException | ExecutionException e) {
// TODO - Handle exception properly
Thread.currentThread().interrupt();
return Collections.emptyList();
}
}
return responseList;
}
private List<DataRequestTask> requestTasks(List<DataRequest> keys) {
List<DataRequestTask> tasks = new ArrayList<>(keys.size());
for (DataRequest key : keys) {
tasks.add(new DataRequestTask(key));
}
return tasks;
}
// In this method I am making a HTTP call to another service
// and then I will make List<DataRequest> accordingly.
private List<DataRequest> generateKeys() {
List<DataRequest> keys = new ArrayList<>();
// use key object which is passed in contructor to make HTTP call to another service
// and then make List of DataRequest object and return keys.
return keys;
}
/** Inner class for the subtasks. */
private static class DataRequestTask extends RecursiveTask<DataResponse> {
private final DataRequest request;
public DataRequestTask(DataRequest request) {
this.request = request;
}
#Override
protected DataResponse compute() {
return performDataRequest(this.request);
}
private DataResponse performDataRequest(DataRequest key) {
// This will have all LogicA code here which is shown in my original design.
// everything as it is same..
return new DataResponse(DataErrorEnum.OK, DataStatusEnum.OK);
}
}
}
DataClient
The DataClient will not change much except for the new thread pool:
public class DataClient implements Client {
private final RestTemplate restTemplate = new RestTemplate();
// Replace the ExecutorService with a ForkJoinPool
private final ForkJoinPool service = new ForkJoinPool(15);
#Override
public List<DataResponse> getSyncData(DataRequest key) {
List<DataResponse> responsList = null;
Future<List<DataResponse>> responseFuture = null;
try {
responseFuture = getAsyncData(key);
responsList = responseFuture.get(key.getTimeout(), key.getTimeoutUnit());
} catch (TimeoutException | ExecutionException | InterruptedException ex) {
responsList = Collections.singletonList(new DataResponse(DataErrorEnum.CLIENT_TIMEOUT, DataStatusEnum.ERROR));
responseFuture.cancel(true);
// logging exception here
}
return responsList;
}
#Override
public Future<List<DataResponse>> getAsyncData(DataRequest key) {
DataFetcherTask task = new DataFetcherTask(key, this.restTemplate);
return this.service.submit(task);
}
}
Once you are on Java8 you may consider changing the implementation to CompletableFutures. Then it would look something like this:
DataClientCF
public class DataClientCF {
private final RestTemplate restTemplate = new RestTemplate();
private final ExecutorService executor = Executors.newFixedThreadPool(15);
public List<DataResponse> getData(DataRequest initialKey) {
return CompletableFuture.supplyAsync(() -> generateKeys(initialKey), this.executor)
.thenApply(requests -> requests.stream().map(this::supplyRequestAsync).collect(Collectors.toList()))
.thenApply(responseFutures -> responseFutures.stream().map(future -> future.join()).collect(Collectors.toList()))
.exceptionally(t -> { throw new RuntimeException(t); })
.join();
}
private List<DataRequest> generateKeys(DataRequest key) {
return new ArrayList<>();
}
private CompletableFuture<DataResponse> supplyRequestAsync(DataRequest key) {
return CompletableFuture.supplyAsync(() -> new DataResponse(DataErrorEnum.OK, DataStatusEnum.OK), this.executor);
}
}
As mentioned in the comments, Guava's ListenableFutures would provide similar functionality for Java7 but without Lambdas they tend to get clumsy.
As I know, RestTemplate is blocking, it is said in ForkJoinPool JavaDoc in ForkJoinTask:
Computations should avoid synchronized methods or blocks, and should minimize other blocking synchronization apart from joining other tasks or using synchronizers such as Phasers that are advertised to cooperate with fork/join scheduling. ...
Tasks should also not perform blocking IO,...
Call in call is redundant.
And you don't need two executors. Also you can return partial result in getSyncData(DataRequest key). This can be done like this
DataClient.java
public class DataClient implements Client {
private RestTemplate restTemplate = new RestTemplate();
// first executor
private ExecutorService service = Executors.newFixedThreadPool(15);
#Override
public List<DataResponse> getSyncData(DataRequest key) {
List<DataResponse> responseList = null;
DataFetcherResult response = null;
try {
response = getAsyncData(key);
responseList = response.get(key.getTimeout(), key.getTimeoutUnit());
} catch (TimeoutException ex) {
response.cancel(true);
responseList = response.getPartialResult();
}
return responseList;
}
#Override
public DataFetcherResult getAsyncData(DataRequest key) {
List<DataRequest> keys = generateKeys(key);
final List<Future<DataResponse>> responseList = new ArrayList<>();
final CountDownLatch latch = new CountDownLatch(keys.size());//assume keys is not null
for (final DataRequest _key : keys) {
responseList.add(service.submit(new Callable<DataResponse>() {
#Override
public DataResponse call() throws Exception {
DataResponse response = null;
try {
response = performDataRequest(_key);
} finally {
latch.countDown();
return response;
}
}
}));
}
return new DataFetcherResult(responseList, latch);
}
// In this method I am making a HTTP call to another service
// and then I will make List<DataRequest> accordingly.
private List<DataRequest> generateKeys(DataRequest key) {
List<DataRequest> keys = new ArrayList<>();
// use key object which is passed in contructor to make HTTP call to another service
// and then make List of DataRequest object and return keys.
return keys;
}
private DataResponse performDataRequest(DataRequest key) {
// This will have all LogicA code here which is shown in my original design.
// everything as it is same..
return null;
}
}
DataFetcherResult.java
public class DataFetcherResult implements Future<List<DataResponse>> {
final List<Future<DataResponse>> futures;
final CountDownLatch latch;
public DataFetcherResult(List<Future<DataResponse>> futures, CountDownLatch latch) {
this.futures = futures;
this.latch = latch;
}
//non-blocking
public List<DataResponse> getPartialResult() {
List<DataResponse> result = new ArrayList<>(futures.size());
for (Future<DataResponse> future : futures) {
try {
result.add(future.isDone() ? future.get() : null);
//instead of null you can return new DataResponse(DataErrorEnum.NOT_READY, DataStatusEnum.ERROR);
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
//ExecutionException or CancellationException could be thrown, especially if DataFetcherResult was cancelled
//you can handle them here and return DataResponse with corresponding DataErrorEnum and DataStatusEnum
}
}
return result;
}
#Override
public List<DataResponse> get() throws ExecutionException, InterruptedException {
List<DataResponse> result = new ArrayList<>(futures.size());
for (Future<DataResponse> future : futures) {
result.add(future.get());
}
return result;
}
#Override
public List<DataResponse> get(long timeout, TimeUnit timeUnit)
throws ExecutionException, InterruptedException, TimeoutException {
if (latch.await(timeout, timeUnit)) {
return get();
}
throw new TimeoutException();//or getPartialResult()
}
#Override
public boolean cancel(boolean mayInterruptIfRunning) {
boolean cancelled = true;
for (Future<DataResponse> future : futures) {
cancelled &= future.cancel(mayInterruptIfRunning);
}
return cancelled;
}
#Override
public boolean isCancelled() {
boolean cancelled = true;
for (Future<DataResponse> future : futures) {
cancelled &= future.isCancelled();
}
return cancelled;
}
#Override
public boolean isDone() {
boolean done = true;
for (Future<DataResponse> future : futures) {
done &= future.isDone();
}
return done;
}
//and etc.
}
I wrote it with CountDownLatch and it looks great, but note there is a nuance.
You can get stuck for a little while in DataFetcherResult.get(long timeout, TimeUnit timeUnit) because CountDownLatch is not synchronized with future's state. And it could happen that latch.getCount() == 0 but not all futures would return future.isDone() == true at the same time. Because they have already passed latch.countDown(); inside finally {} Callable's block but didn't change internal state which is still equals to NEW.
And so calling get() inside get(long timeout, TimeUnit timeUnit) can cause a small delay.
Similar case was described here.
Get with timeout DataFetcherResult.get(...) could be rewritten using futures future.get(long timeout, TimeUnit timeUnit) and you can remove CountDownLatch from a class.
public List<DataResponse> get(long timeout, TimeUnit timeUnit)
throws ExecutionException, InterruptedException{
List<DataResponse> result = new ArrayList<>(futures.size());
long timeoutMs = timeUnit.toMillis(timeout);
boolean timeout = false;
for (Future<DataResponse> future : futures) {
long beforeGet = System.currentTimeMillis();
try {
if (!timeout && timeoutMs > 0) {
result.add(future.get(timeoutMs, TimeUnit.MILLISECONDS));
timeoutMs -= System.currentTimeMillis() - beforeGet;
} else {
if (future.isDone()) {
result.add(future.get());
} else {
//result.add(new DataResponse(DataErrorEnum.NOT_READY, DataStatusEnum.ERROR)); ?
}
}
} catch (TimeoutException e) {
result.add(new DataResponse(DataErrorEnum.TIMEOUT, DataStatusEnum.ERROR));
timeout = true;
}
//you can also handle ExecutionException or CancellationException here
}
return result;
}
This code was given as an example and it should be tested before using in production, but seems legit :)
Inspired by a comment to an given answer I tried to create a thread-safe implementation of the multiton pattern, which relies on unique keys and performs locks on them (I have the idea from JB Nizet's answer on this question).
Question
Is the implementation I provided viable?
I'm not interested in whether Multiton (or Singleton) are in general good patterns, it would result in a discussion. I just want a clean and working implementation.
Contras:
You have to know how many instances you want to create at compile time .
Pros
No lock on whole class, or whole map. Concurrent calls to getInstanceare possible.
Getting instances via key object, and not just unbounded int or String, so you can be sure to get an non-null instance after the method call.
Thread-safe (at least that's my impression).
public class Multiton
{
private static final Map<Enum<?>, Multiton> instances = new HashMap<Enum<?>, Multiton>();
private Multiton() {System.out.println("Created instance."); }
/* Can be called concurrently, since it only synchronizes on id */
public static <KEY extends Enum<?> & MultitionKey> Multiton getInstance(KEY id)
{
synchronized (id)
{
if (instances.get(id) == null)
instances.put(id, new Multiton());
}
System.out.println("Retrieved instance.");
return instances.get(id);
}
public interface MultitionKey { /* */ }
public static void main(String[] args) throws InterruptedException
{
//getInstance(Keys.KEY_1);
getInstance(OtherKeys.KEY_A);
Runnable r = new Runnable() {
#Override
public void run() { getInstance(Keys.KEY_1); }
};
int size = 100;
List<Thread> threads = new ArrayList<Thread>();
for (int i = 0; i < size; i++)
threads.add(new Thread(r));
for (Thread t : threads)
t.start();
for (Thread t : threads)
t.join();
}
enum Keys implements MultitionKey
{
KEY_1;
/* define more keys */
}
enum OtherKeys implements MultitionKey
{
KEY_A;
/* define more keys */
}
}
I tried to prevent the resizing of the map and the misuse of the enums I sychronize on.
It's more of a proof of concept, before I can get it over with! :)
public class Multiton
{
private static final Map<MultitionKey, Multiton> instances = new HashMap<MultitionKey, Multiton>((int) (Key.values().length/0.75f) + 1);
private static final Map<Key, MultitionKey> keyMap;
static
{
Map<Key, MultitionKey> map = new HashMap<Key, MultitionKey>();
map.put(Key.KEY_1, Keys.KEY_1);
map.put(Key.KEY_2, OtherKeys.KEY_A);
keyMap = Collections.unmodifiableMap(map);
}
public enum Key {
KEY_1, KEY_2;
}
private Multiton() {System.out.println("Created instance."); }
/* Can be called concurrently, since it only synchronizes on KEY */
public static <KEY extends Enum<?> & MultitionKey> Multiton getInstance(Key id)
{
#SuppressWarnings ("unchecked")
KEY key = (KEY) keyMap.get(id);
synchronized (keyMap.get(id))
{
if (instances.get(key) == null)
instances.put(key, new Multiton());
}
System.out.println("Retrieved instance.");
return instances.get(key);
}
private interface MultitionKey { /* */ }
private enum Keys implements MultitionKey
{
KEY_1;
/* define more keys */
}
private enum OtherKeys implements MultitionKey
{
KEY_A;
/* define more keys */
}
}
It is absolutely not thread-safe. Here is a simple example of the many, many things that could go wrong.
Thread A is trying to put at key id1. Thread B is resizing the buckets table due to a put at id2. Because these have different synchronization monitors, they're off to the races in parallel.
Thread A Thread B
-------- --------
b = key.hash % map.buckets.size
copy map.buckets reference to local var
set map.buckets = new Bucket[newSize]
insert keys from old buckets into new buckets
insert into map.buckets[b]
In this example, let's say Thread A saw the map.buckets = new Bucket[newSize] modification. It's not guaranteed to (since there's no happens-before edge), but it may. In that case, it'll be inserting the (key, value) pair into the wrong bucket. Nobody will ever find it.
As a slight variant, if Thread A copied the map.buckets reference to a local var and did all its work on that, then it'd be inserting into the right bucket, but the wrong buckets table; it wouldn't be inserting into the new one that Thread B is about to install as the table for everyone to see. If the next operation on key 1 happens to see the new table (again, not guaranteed to but it may), then it won't see Thread A's actions because they were done on a long-forgotten buckets array.
I'd say not viable.
Synchronizing on the id parameter is fraught with dangers - what if they use this enum for another synchronization mechanism? And of course HashMap is not concurrent as the comments have pointed out.
To demonstrate - try this:
Runnable r = new Runnable() {
#Override
public void run() {
// Added to demonstrate the problem.
synchronized(Keys.KEY_1) {
getInstance(Keys.KEY_1);
}
}
};
Here's an implementation that uses atomics instead of synchronization and therefore should be more efficient. It is much more complicated than yours but handling all of the edge cases in a Miltiton IS complicated.
public class Multiton {
// The static instances.
private static final AtomicReferenceArray<Multiton> instances = new AtomicReferenceArray<>(1000);
// Ready for use - set to false while initialising.
private final AtomicBoolean ready = new AtomicBoolean();
// Everyone who is waiting for me to initialise.
private final Queue<Thread> waiters = new ConcurrentLinkedQueue<>();
// For logging (and a bit of linguistic fun).
private final int forInstance;
// We need a simple constructor.
private Multiton(int forInstance) {
this.forInstance = forInstance;
log(forInstance, "New");
}
// The expensive initialiser.
public void init() throws InterruptedException {
log(forInstance, "Init");
// ... presumably heavy stuff.
Thread.sleep(1000);
// We are now ready.
ready();
}
private void ready() {
log(forInstance, "Ready");
// I am now ready.
ready.getAndSet(true);
// Unpark everyone waiting for me.
for (Thread t : waiters) {
LockSupport.unpark(t);
}
}
// Get the instance for that one.
public static Multiton getInstance(int which) throws InterruptedException {
// One there already?
Multiton it = instances.get(which);
if (it == null) {
// Lazy make.
Multiton newIt = new Multiton(which);
// Successful put?
if (instances.compareAndSet(which, null, newIt)) {
// Yes!
it = newIt;
// Initialise it.
it.init();
} else {
// One appeared as if by magic (another thread got there first).
it = instances.get(which);
// Wait for it to finish initialisation.
// Put me in its queue of waiters.
it.waiters.add(Thread.currentThread());
log(which, "Parking");
while (!it.ready.get()) {
// Park me.
LockSupport.park();
}
// I'm not waiting any more.
it.waiters.remove(Thread.currentThread());
log(which, "Unparked");
}
}
return it;
}
// Some simple logging.
static void log(int which, String s) {
log(new Date(), "Thread " + Thread.currentThread().getId() + " for Multiton " + which + " " + s);
}
static final DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
// synchronized so I don't need to make the DateFormat ThreadLocal.
static synchronized void log(Date d, String s) {
System.out.println(dateFormat.format(d) + " " + s);
}
// The tester class.
static class MultitonTester implements Runnable {
int which;
private MultitonTester(int which) {
this.which = which;
}
#Override
public void run() {
try {
Multiton.log(which, "Waiting");
Multiton m = Multiton.getInstance(which);
Multiton.log(which, "Got");
} catch (InterruptedException ex) {
Multiton.log(which, "Interrupted");
}
}
}
public static void main(String[] args) throws InterruptedException {
int testers = 50;
int multitons = 50;
// Do a number of them. Makes n testers for each Multiton.
for (int i = 1; i < testers * multitons; i++) {
// Which one to create.
int which = i / testers;
//System.out.println("Requesting Multiton " + i);
new Thread(new MultitonTester(which+1)).start();
}
}
}
I'm not a Java programmer, but: HashMap is not safe for concurrent access. Might I recommend ConcurrentHashMap.
private static final ConcurrentHashMap<Object, Multiton> instances = new ConcurrentHashMap<Object, Multiton>();
public static <TYPE extends Object, KEY extends Enum<Keys> & MultitionKey<TYPE>> Multiton getInstance(KEY id)
{
Multiton result;
synchronized (id)
{
result = instances.get(id);
if(result == null)
{
result = new Multiton();
instances.put(id, result);
}
}
System.out.println("Retrieved instance.");
return result;
}