Ensuring consistency in in-memory data structures

Ensuring consistency in in-memory data structures - java

I was wondering what would have been the best way (probably I am looking for a pattern) to ensure data consistency in an everyday example where a service retrieves data by invoking some method on a gateway (the gateway might be the boundary for DB, a web call or even an in-memory data structure).
Here is an example of the case I described in Java:
public class MyService {
private DataSourceGateway dataSourceGateway;
public MyService(DataSourceGateway dataSourceGateway) {
this.dataSourceGateway = dataSourceGateway;
}
public void myServiceMethod() {
doThings1();
doThings2();
}
private void doThings1() {
int count = dataSourceGateway.getCount();
// do things with this count
}
private void doThings2() {
int count = dataSourceGateway.getCount();
// do things with this count which is expected to be the same as in doThings1()
}
}
Let's assume that DataSourceGateway might not be a database that offers transactions, but an in-memory data structure and DataSourceGateway and Service are Singletons.
The obvious consistency problem would be when another service modifies the data in DataSourceGateway when a call to myServiceMethod() has completed doThings1() and is about to execute doThings2().
The example is simplified but consider the cases when not the same method on dataSourceGateway is called but also calls on objects that have some dependencies (e.g. getCount and getAll should be consistent). The reason I called this an everyday problem is because I have seen the same DataSourceGateways (or DAOs) being injected in multiple places and usually the developer expect that the calls would be consistent.
What would be the best way to ensure that count would be the same these 2 calls?

Related

Spring - Storing volatile data in memory

I'm developing a SpringBoot web application for managing gaming servers.
I want to have a cronjob that queries the servers, checks whether they have crashed and collects relevant data, such as the number of players online etc. This data needs to be stored and shared among services that require it. Since this data will change often and will become invalid after the whole application stops, I don't want to persist these stats in the database, but in the application memory.
Current implementation
Currently, my implementation is pretty naive - having a collection as a member field of the corresponding Spring service and storing the server statuses there. However I feel this is a really bad solution, as the services should be stateless and also I don't take concurrency into account.
Example code:
#Service
public class ServersServiceImpl implements ServersService {
private final Map<Long, ServerStats> stats = new HashMap<>(); // Map server ID -> stats
...
public void startServer(Long id) {
// ... call service to actually start server process
serverStats.setRunning(true);
stats.put(id, serverStats);
}
...
}
Alternative: Using #Repository classes
I could move the collection with the data to classes with #Repository annotation, which would be semantically more correct. There, I would implement a thread-safe logic of storing the data in java collection. Then I would inject this repository into relevant services.
#Repository
public class ServerStatsRepository {
private final Map<Long, ServerStats> stats = new ConcurrentHashMap<>();
...
public ServerStats getServerStats(Long id) {
return stats.get(id);
}
public ServerStats updateServerStats(Long id, ServerStats serverStats) {
return stats.put(id, serverStats);
}
...
}
Using Redis also came to mind, but I don't want to add too much complexity to the app.
Is my proposed solution a valid approach? Would there be any better option of handling this problem?

Android Architecture Components network threads

I'm currently checking out the following guide: https://developer.android.com/topic/libraries/architecture/guide.html
The networkBoundResource class:
// ResultType: Type for the Resource data
// RequestType: Type for the API response
public abstract class NetworkBoundResource<ResultType, RequestType> {
// Called to save the result of the API response into the database
#WorkerThread
protected abstract void saveCallResult(#NonNull RequestType item);
// Called with the data in the database to decide whether it should be
// fetched from the network.
#MainThread
protected abstract boolean shouldFetch(#Nullable ResultType data);
// Called to get the cached data from the database
#NonNull #MainThread
protected abstract LiveData<ResultType> loadFromDb();
// Called to create the API call.
#NonNull #MainThread
protected abstract LiveData<ApiResponse<RequestType>> createCall();
// Called when the fetch fails. The child class may want to reset components
// like rate limiter.
#MainThread
protected void onFetchFailed() {
}
// returns a LiveData that represents the resource
public final LiveData<Resource<ResultType>> getAsLiveData() {
return result;
}
}
I'm a bit confused here about the use of threads.
Why is #MainThread applied here for networkIO?
Also, for saving into the db, #WorkerThread is applied, whereas #MainThread for retrieving results.
Is it bad practise to use a worker thread by default for NetworkIO and local db interaction?
I'm also checking out the following demo (GithubBrowserSample): https://github.com/googlesamples/android-architecture-components
This confuses me from a threading point of view.
The demo uses executors framework, and defines a fixed pool with 3 threads for networkIO, however in the demo only a worker task is defined for one call, i.e. the FetchNextSearchPageTask. All other network requests seem to be executed on the main thread.
Can someone clarify the rationale?

It seems you have a few misconceptions.
Generally it is never OK to call network from the Main (UI) thread but unless you have a lot of data it might be OK to fetch data from DB in the Main thread. And this is what Google example does.
1.
The demo uses executors framework, and defines a fixed pool with 3 threads for networkIO, however in the demo only a worker task is defined for one call, i.e. the FetchNextSearchPageTask.
First of all, since Java 8 you can create simple implementation of some interfaces (so called "functional interfaces") using lambda syntax. This is what happens in the NetworkBoundResource:
appExecutors.diskIO().execute(() -> {
saveCallResult(processResponse(response));
appExecutors.mainThread().execute(() ->
// we specially request a new live data,
// otherwise we will get immediately last cached value,
// which may not be updated with latest results received from network.
result.addSource(loadFromDb(),
newData -> result.setValue(Resource.success(newData)))
);
});
at first task (processResponse and saveCallResult) is scheduled on a thread provided by the diskIO Executor and then from that thread the rest of the work is scheduled back to the Main thread.
2.
Why is #MainThread applied here for networkIO?
and
All other network requests seem to be executed on the main thread.
This is not so. Only result wrapper i.e. LiveData<ApiResponse<RequestType>> is created on the main thread. The network request is done on a different thread. This is not easy to see because Retrofit library is used to do all the network-related heavy lifting and it nicely hides such implementation details. Still, if you look at the LiveDataCallAdapter that wraps Retrofit into a LiveData, you can see that Call.enqueue is used which is actually an asynchronous call (scheduled internally by Retrofit).
Actually if not for "pagination" feature, the example would not need networkIO Executor at all. "Pagination" is a complicated feature and thus it is implemented using explicit FetchNextSearchPageTask and this is a place where I think Google example is done not very well: FetchNextSearchPageTask doesn't re-use request parsing logic (i.e. processResponse) from RepoRepository but just assumes that it is trivial (which it is now, but who knows about the future...). Also there is no scheduling of the merging job onto the diskIO Executor which is also inconsistent with the rest of the response processing.

Clean Architecture and Cache Invalidation

I have an app that tries to follow the Clean Architecture and I need to do some cache invalidation but I don't know in which layer this should be done.
For the sake of this example, let's say I have an OrderInteractor with 2 use cases : getOrderHistory() and sendOrder(Order).
The first use case is using an OrderHistoryRepository and the second one is using a OrderSenderRepository. Theses repositories are interfaces with multiple implementations (MockOrderHistoryRepository and InternetOrderHistoryRepository for the first one). The OrderInteractor only interact with theses repositories through the interfaces in order to hide the real implementation.
The Mock version is very dummy but the Internet version of the history repository is keeping some data in cache to perform better.
Now, I want to implement the following : when an order is sent successfully, I want to invalidate the cache of the history but I don't know where exactly I should perform the actual cache invalidation.
My first guess is to add a invalidateCache() to the OrderHistoryRepository and use this method at the end of the sendOrder() method inside the interactor. In the InternetOrderHistoryRepository, I will just have to implement the cache invalidation and I will be good. But I will be forced to actually implement the method inside the MockOrderHistoryRepository and it's exposing to the outside the fact that some cache management is performed by the repository. I think that the OrderInteractor should not be aware of this cache management because it is implementation details of the Internet version of the OrderHistoryRepository.
My second guess would be perform the cache invalidation inside the InternetOrderSenderRepository when it knows that the order was sent successfully but it will force this repository to know the InternetOrderHistoryRepository in order to get the cache key used by this repo for the cache management. And I don't want my OrderSenderRepository to have a dependency with the OrderHistoryRepository.
Finally, my third guess is to have some sort of CacheInvalidator (whatever the name) interface with a Dummy implementation used when the repository is mocked and an Real implementation when the Interactor is using the Internet repositories. This CacheInvalidator would be injected to the Interactor and the selected implementation would be provided by a Factory that's building the repository and the CacheInvalidator. This means that I will have a MockedOrderHistoryRepositoryFactory - that's building the MockedOrderHistoryRepository and the DummyCacheInvalidator - and a InternetOrderHistoryRepositoryFactory - that's building the InternetOrderHistoryRepository and the RealCacheInvalidator. But here again, I don't know if this CacheInvalidator should be used by the Interactor at the end of sendOrder() or directly by the InternetOrderSenderRepository (even though I think the latter is better because again the interactor should probably not know that there is some cache management under the hood).
What would be your preferred way of architecturing this ?
Thank you very much.
Pierre

Your 2nd guess is correct because caching is a detail of the persistence mechanism. E.g. if the repository would be a file based repository caching might not be an issue (e.g. a local ssd).
The interactor (use case) should not know about caching at all. This will make it easier to test because you don't need a real cache or mock for testing.
My second guess would be perform the cache invalidation inside the InternetOrderSenderRepository when it knows that the order was sent successfully but it will force this repository to know the InternetOrderHistoryRepository in order to get the cache key used by this repo for the cache management.
It seems that your cache key is a composite of multiple order properties and therefore you need to encapsulate the cache key creation logic somewhere for reuse.
In this case, you have the following options:
One implementation for both interfaces
You can create a class that implements the InternetOrderSenderRepository as well as the InternetOrderHistoryRepository interface. In this case, you can extract the cache key generation logic into a private method and reuse it.
Use a utility class for the cache key creation
Simple extract the cache key creation logic in a utility class and use it in both repositories.
Create a cache key class
A cache key is just an arbitrary object because a cache must only check if a key exists and this means use the equals method that every object has. But to be more type-safe most caches use a generic type for the key so that you can define one.
Thus you can put the cache key logic and validation in an own class. This has the advantage that you can easily test that logic.
public class OrderCacheKey {
private Integer orderId;
private int version;
public OrderCacheKey(Integer orderId, int version) {
this.orderId = Objects.requireNonNull(orderId);
if (version < 0) {
throw new IllegalArgumentException("version must be a positive integer");
}
this.version = version;
}
public OrderCacheKey(Order order) {
this(order.getId(), order.getVersion());
}
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
OrderCacheKey other = (OrderCacheKey) obj;
if (!Objects.equals(orderId, other.orderId))
return false;
return Objects.equals(version, other.version);
}
public int hashCode() {
int result = 1;
result = 31 * result + Objects.hashCode(orderId);
result = 31 * result + Objects.hashCode(version);
return result;
}
}
You can use this class as the key type of your cache: Cache<OrderCacheKey, Order>. Then you can use the OrderCacheKey class in both repository implementations.
Introduce a order cache interface to hide caching details
You can apply the interface segregation principle and hide the complete caching details behind a simple interface. This will make your unit tests more easy because you have to mock less.
public interface OrderCache {
public void add(Order order);
public Order get(Integer orderId, int version);
public void remove(Order order);
public void removeByKey(Integer orderId, int version);
}
You can then use the OrderCache in both repository implementations and you can also combine the interface segregation with the cache key class above.
How to apply
You can use aspect-oriented programming and one of the options above to implement the caching
You can create a wrapper (or delegate) for each repository that applies caching and delegates to the real repositories when needed. This is very similar to the aspect-oriented way. You just implement the aspect manually.

Best way to sequence a pair of external service calls in Akka

I need to geocode an Address object, and then store the updated Address in a search engine. This can be simplified to taking an object, performing one long-running operation on the object, and then persisting the object. This means there is an order of operations requirement that the first operation be complete before persistence occurs.
I would like to use Akka to move this off the main thread of execution.
My initial thought was to use a pair of Futures to accomplish this, but the Futures documentation is not entirely clear on which behavior (fold, map, etc) guarantees one Future to be executed before another.
I started out by creating two functions, defferedGeocode and deferredWriteToSearchEngine which return Futures for the respective operations. I chain them together using Future<>.andThen(new OnComplete...), but this gets clunky very quickly:
Future<Address> geocodeFuture = defferedGeocode(ec, address);
geocodeFuture.andThen(new OnComplete<Address>() {
public void onComplete(Throwable failure, Address geocodedAddress) {
if (geocodedAddress != null) {
Future<Address> searchEngineFuture = deferredWriteToSearchEngine(ec, addressSearchService, geocodedAddress);
searchEngineFuture.andThen(new OnComplete<Address>() {
public void onComplete(Throwable failure, Address savedAddress) {
// process search engine results
}
});
}
}
}, ec);
And then deferredGeocode is implemented like this:
private Future<Address> defferedGeocode(
final ExecutionContext ec,
final Address address) {
return Futures.future(new Callable<Address>() {
public Address call() throws Exception {
log.debug("Geocoding Address...");
return address;
}
}, ec);
};
deferredWriteToSearchEngine is pretty similar to deferredGeocode, except it takes the search engine service as an additional final parameter.
My understand is that Futures are supposed to be used to perform calculations and should not have side effects. In this case, geocoding the address is calculation, so I think using a Future is reasonable, but writing to the search engine is definitely a side effect.
What is the best practice here for Akka? How can I avoid all the nested calls, but ensure that both the geocoding and the search engine write are done off the main thread?
Is there a more appropriate tool?
Update:
Based on Viktor's comments below, I am trying this code out now:
ExecutionContext ec;
private Future<Address> addressBackgroundProcess(Address address) {
Future<Address> geocodeFuture = addressGeocodeFutureFactory.defferedGeocode(address);
return geocodeFuture.flatMap(new Mapper<Address, Future<Address>>() {
#Override
public Future<Address> apply(Address geoAddress) {
return addressSearchEngineFutureFactory.deferredWriteToSearchEngine(geoAddress);
}
}, ec);
}
This seems to work ok except for one issue which I'm not thrilled with. We are working in a Spring IOC code base, and so I would like to inject the ExecutionContext into the FutureFactory objects, but it seems wrong for this function (in our DAO) to need to be aware of the ExecutionContext.
It seems odd to me that the flatMap() function needs an EC at all, since both futures provide one.
Is there a way to maintain the separation of concerns? Am I structuring the code badly, or is this just the way it needs to be?
I thought about creating an interface in the FutureFactory's that would allow chaining of FutureFactory's, so the flatMap() call would be encapsulated in a FutureFactory base class, but this seems like it would be deliberately subverting an intentional Akka design decision.

Warning: Pseudocode ahead.
Future<Address> myFutureResult = deferredGeocode(ec, address).flatMap(
new Mapper<Address, Future<Address>>() {
public Future<Address> apply(Address geocodedAddress) {
return deferredWriteToSearchEngine(ec, addressSearchService, geocodedAddress);
}
}, ec).map(
new Mapper<Address, SomeResult>() {
public SomeResult apply(Address savedAddress) {
// Create SomeResult after deferredWriteToSearchEngine is done
}
}, ec);
See how it is not nested. flatMap and map is used for sequencing the operations. "andThen" is useful for when you want a side-effecting-only operation to run to full completion before passing the result on. Of course, if you map twice on the SAME future-instance then there is no ordering guaranteed, but since we are flatMapping and mapping on the returned futures (new ones according to the docs), there is a clear data-flow in our program.

is this Service layer violating SRP principle

I am developing a Web App in Spring and hibernate.
I am loading entities in Database.Authors,books,Publication etc are my entities which are getting loaded from excel.
I have mode one Entity Load Service interface and then I have its Implementations for every entity.
My Service calls DAO implementations.
Now I am struggling to find if the below mentioned code violates SRP.
Also I am always confused about how to decide responsibility of the class because any class can have many methods and each method can be performing something different.So should they be separated in different class?.take in my case I have 4 methods each performing different task.So I end up with 4 different class for each method.If I follow this approach(which I know is wrong) then I will always end up in classes having single method.
Also,sometimes I feel that I going away from domain driven design because I am refracting the code on the basis of functionality.
Any suggestions on how to decide what the responsibility is from the perspective a class?
SRP stands for single responsibility principle.And I am really confused in identifying this responsibility.
public interface EntitiesLoadService {
public void loadEntities(Object o);
public void deleteEntities(Object o);
public List getEntities();
public Object getEntity(Object o);
}
Service Implementation
#Service("authorLoadService")
#Transactional
public class AuthorEntityLoadService implements EntitiesLoadService{
private AuthorDAO authorDao;
#Autowired
#Qualifier("authorDAO")
public void setAuthorDao(AuthorDAO authorDao) {
this.authorDao = authorDao;
}
#Override
public void deleteEntities(Object o) {
// TODO Auto-generated method stub
}
#Override
public void loadEntities(Object o) {
Set<author_pojo> author=(Set<author_pojo>)o;
Iterator<author_pojo> itr=author.iterator();
while (itr.hasNext()) {
author_pojo authorPojo = (author_pojo) itr.next();
authorDao.save(authorPojo);
}
}
#Override
#Transactional(readOnly=true)
public List getEntities() {
// TODO Auto-generated method stub
return null;
}
#Override
#Transactional(readOnly=true)
public Object getEntity(Object o) {
String author=(String)o;
author_pojo fetAuthor=authorDao.findOneByName(author);
return fetAuthor;
}
}

You have AuthorDAO which is the class that should be doing all interactions with the persistence layer, ex. a database.
It isn't obvious in your example because your AuthorEntityLoadService has similar methods which just delegate to the DAO layer.
As your project and requirements grow, you will see that more methods are required for this class. These methods will be responsible for doing more than just CRUD operations on the DAO layer. They might need to interact with other services, internal or external. They might need to do multiple DAO calls.
The Single Responsibility in this case is to provide services for interacting with AuthorEntity instances.
It is on of many correct ways of implementing what you are proposing.
More specifically, my opinion on
Also I am always confused about how to decide responsibility of the
class because any class can have many methods and each method can be
performing something different.So should they be separated in
different class?
Just because you have many methods doing different things, doesn't mean the responsibility isn't shared. AuthorEntityLoadService which I would just call AuthorEntityService manages AuthorEntity instances at the service layer. Image if you had one Class with one method for each of create, update, retrieve, delete an AuthorEntity. That wouldn't make much sense.
And on
Any suggestions on how to decide what the responsibility is from the
perspective a class?
As further reading, try http://java.dzone.com/articles/defining-class-responsibility

Typically, in this type of n-tier architecture, your service layer is meant to provide an API of transactional (or otherwise resource-dependent) operations. The implementation of each service can use whatever resource-specific dependencies (like DAOs for a particular datasource) it needs, but it allows the service consumer to remain agnostic of these specific dependencies or resources.
So even if your service is just delegating to its resource-specific dependencies, it doesn't violate SRP because its responsibility is to define a resource-agnostic API (so that the consumer doesn't need to know all the resource-specific stuff) that specifies atomic operations (transactional if necessary).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.