I need to geocode an Address object, and then store the updated Address in a search engine. This can be simplified to taking an object, performing one long-running operation on the object, and then persisting the object. This means there is an order of operations requirement that the first operation be complete before persistence occurs.
I would like to use Akka to move this off the main thread of execution.
My initial thought was to use a pair of Futures to accomplish this, but the Futures documentation is not entirely clear on which behavior (fold, map, etc) guarantees one Future to be executed before another.
I started out by creating two functions, defferedGeocode and deferredWriteToSearchEngine which return Futures for the respective operations. I chain them together using Future<>.andThen(new OnComplete...), but this gets clunky very quickly:
Future<Address> geocodeFuture = defferedGeocode(ec, address);
geocodeFuture.andThen(new OnComplete<Address>() {
public void onComplete(Throwable failure, Address geocodedAddress) {
if (geocodedAddress != null) {
Future<Address> searchEngineFuture = deferredWriteToSearchEngine(ec, addressSearchService, geocodedAddress);
searchEngineFuture.andThen(new OnComplete<Address>() {
public void onComplete(Throwable failure, Address savedAddress) {
// process search engine results
}
});
}
}
}, ec);
And then deferredGeocode is implemented like this:
private Future<Address> defferedGeocode(
final ExecutionContext ec,
final Address address) {
return Futures.future(new Callable<Address>() {
public Address call() throws Exception {
log.debug("Geocoding Address...");
return address;
}
}, ec);
};
deferredWriteToSearchEngine is pretty similar to deferredGeocode, except it takes the search engine service as an additional final parameter.
My understand is that Futures are supposed to be used to perform calculations and should not have side effects. In this case, geocoding the address is calculation, so I think using a Future is reasonable, but writing to the search engine is definitely a side effect.
What is the best practice here for Akka? How can I avoid all the nested calls, but ensure that both the geocoding and the search engine write are done off the main thread?
Is there a more appropriate tool?
Update:
Based on Viktor's comments below, I am trying this code out now:
ExecutionContext ec;
private Future<Address> addressBackgroundProcess(Address address) {
Future<Address> geocodeFuture = addressGeocodeFutureFactory.defferedGeocode(address);
return geocodeFuture.flatMap(new Mapper<Address, Future<Address>>() {
#Override
public Future<Address> apply(Address geoAddress) {
return addressSearchEngineFutureFactory.deferredWriteToSearchEngine(geoAddress);
}
}, ec);
}
This seems to work ok except for one issue which I'm not thrilled with. We are working in a Spring IOC code base, and so I would like to inject the ExecutionContext into the FutureFactory objects, but it seems wrong for this function (in our DAO) to need to be aware of the ExecutionContext.
It seems odd to me that the flatMap() function needs an EC at all, since both futures provide one.
Is there a way to maintain the separation of concerns? Am I structuring the code badly, or is this just the way it needs to be?
I thought about creating an interface in the FutureFactory's that would allow chaining of FutureFactory's, so the flatMap() call would be encapsulated in a FutureFactory base class, but this seems like it would be deliberately subverting an intentional Akka design decision.
Warning: Pseudocode ahead.
Future<Address> myFutureResult = deferredGeocode(ec, address).flatMap(
new Mapper<Address, Future<Address>>() {
public Future<Address> apply(Address geocodedAddress) {
return deferredWriteToSearchEngine(ec, addressSearchService, geocodedAddress);
}
}, ec).map(
new Mapper<Address, SomeResult>() {
public SomeResult apply(Address savedAddress) {
// Create SomeResult after deferredWriteToSearchEngine is done
}
}, ec);
See how it is not nested. flatMap and map is used for sequencing the operations. "andThen" is useful for when you want a side-effecting-only operation to run to full completion before passing the result on. Of course, if you map twice on the SAME future-instance then there is no ordering guaranteed, but since we are flatMapping and mapping on the returned futures (new ones according to the docs), there is a clear data-flow in our program.
Related
I'm kind of forced to switch to reactive programming (and in a short time frame), using WebFlux and I'm having a really hard time understanding it. Maybe because the lack of examples or because I never did functional programming.
Anyways, my question is where to use Mono/Flux and where can I work with normal objects? E.g. my controller is waiting for a #Valid User object, should that be #Valid Mono or something like Mono<#Valid User>? If let's say it was just a User object, I pass it to my service layer, and I want to encode the password before saving it to the reactive MongoDb, should I write:
User.setPassword(...);
return reactiveMongoDbRepository.save(user); //returns Mono<User> which is returned by the Controller to the View
Or it should be something like
return Mono.just(user).map.flatmap(setPasswordSomewhereInThisHardToFollowChain).then.whatever.doOnSuccess(reactiveMongoDbRepository::save);
In other words, am I forced to use this pipeline thing EVERYWHERE to maintain reactiveness or doing some steps the imperative way, like unwrapping the object, working on it, and wrapping it back is OK?
I know my question seems to be silly but I don't have the big picture at all, reading books about it didn't really help yet, please be gentle on me. :)
Use pipelining when you require sequential, asynchronous and lazy execution. In all other cases (when you are using a non-blocking code) you're free to choose any approach and it's generally better to use the simplest one.
Sequential non-blocking code can be organised in functions that you can integrate with reactive pipeline using map/filter/doOnNext/... components.
For example, consider the following Order price calculation code.
class PriceCalculator {
private final Map<ProductCode, Price> prices;
PriceCalculator(Map<ProductCode, Price> prices) {
this.prices = prices;
}
PricedOrder calculatePrice(Order order) { // doesn't deal with Mono/Flux stuff
Double price = order.orderLines.stream()
.map(orderLine -> prices.get(orderLine.productCode))
.map(Price::doubleValue)
.sum();
return new PricedOrder(order, price);
}
}
class PricingController {
public Mono<PricedOrder> getPricedOrder(ServerRequest request) {
OrderId orderId = new OrderId(request.pathVariable("orderId"));
Mono<Order> order = orderRepository.get(orderId);
return order.map(priceCalculator::calculatePrice)
}
}
I have a request that is rather simple to formulate, but I cannot pull it of without leaking resources.
I want to return a response of type application/stream+json, featuring news events someone posted. I do not want to use Websockets, not because I don't like them, I just want to know how to do it with a stream.
For this I need to return a Flux<News> from my restcontroller, that is continuously fed with news, once someone posts any.
My attempt for this was creating a Publisher:
public class UpdatePublisher<T> implements Publisher<T> {
private List<Subscriber<? super T>> subscribers = new ArrayList<>();
#Override
public void subscribe(Subscriber<? super T> s) {
subscribers.add(s);
}
public void pushUpdate(T message) {
subscribers.forEach(s -> s.onNext(message));
}
}
And a simple News Object:
public class News {
String message;
// Constructor, getters, some properties omitted for readability...
}
And endpoints to publish news respectively get the stream of news
// ...
private UpdatePublisher<String> updatePublisher = new UpdatePublisher<>();
#GetMapping(value = "/news/ticker", produces = "application/stream+json")
public Flux<News> getUpdateStream() {
return Flux.from(updatePublisher).map(News::new);
}
#PutMapping("/news")
public void putNews(#RequestBody News news) {
updatePublisher.pushUpdate(news.getMessage());
}
This WORKS, but I cannot unsubscribe, or access any given subscription again - so once a client disconnects, the updatePublisher will just continue to push onto a growing number of dead channels - as I have no way to call the onCompleted() handler on the subscriptions.
TL;DL:
Can one push messages onto a possible endless Flux from a different thread and still terminate the Flux on demand without relying on a reset by peer exception or something along those lines?
You should never try to implement yourself the Publisher interface, as it boils down to getting the reactive streams implementation right. This is exactly the issue you're facing here.
Instead you should use one of the generator operators provided by Reactor itself (this is actually a Reactor question, nothing specific to Spring WebFlux).
In this case, Flux.create or Flux.push are probably the best candidates, given your code uses some type of event listener to push events down the stream. See the reactor project reference documentation on that.
Without more details, it's hard to give you a concrete code sample that solves your problem. Here are a few pointers though:
you might want to .share() the stream of events for all subscribers if you'd like some multicast-like communication pattern
pay attention to the push/pull/push+pull model that you'd like to have here; how is the backpressure supposed to work here? What if we produce more events that the subscribers can handle?
this model would only work on a single application instance. If you'd like this to work on multiple application instances, you might want to look into messaging patterns using a broker
I have a webapp in which I have to return the results from a mongodb find() to the front-end from my java back-end.
I am using the Async Java driver, and the only way I think I have to return the results from mongo is something like this:
public String getDocuments(){
...
collection.find(query).map(Document::toJson)
.into(new HashSet<String>(), new SingleResultCallback<HashSet<String>>() {
#Override
public void onResult(HashSet<String> strings, Throwable throwable) {
// here I have to get all the Json Documents in the set,
// make a whole json string and wake the main thread
}
});
// here I have to put the main thread to wait until I get the data in
// the onResult() method so I can return the string back to the front-end
...
return jsonString;
}
Is this assumption right or thereĀ“s another way to do it?
Asynchronous APIs (any API based on callbacks, not necessarily MongoDB) can be a true blessing for multithreaded applications. But to really benefit from them, you need to design your whole application architecture in an asynchronous fashion. This is not always feasible, especially when it is supposed to fit into a given framework which isn't built on callbacks.
So sometimes (like in your case) you just want to use an asynchronous API in a synchronous fashion. In that case, you can use the class CompletableFuture.
This class provides (among others) two methods <T> get() and complete(<T> value). The method get will block until complete is called to provide the return value (should complete get called before get, get returns immediately with the provided value).
public String getDocuments(){
...
CompletableFuture<String> result = new CompletableFuture<>(); // <-- create an empty, uncompleted Future
collection.find(query).map(Document::toJson)
.into(new HashSet<String>(), new SingleResultCallback<HashSet<String>>() {
#Override
public void onResult(HashSet<String> strings, Throwable throwable) {
// here I have to get all the Json Documents in the set and
// make a whole json string
result.complete(wholeJsonString); // <--resolves the future
}
});
return result.get(); // <-- blocks until result.complete is called
}
The the get()-method of CompletableFuture also has an alternative overload with a timeout parameter. I recommend using this to prevent your program from accumulating hanging threads when the callback is not called for whatever reason. It will also be a good idea to implement your whole callback in a try { block and do the result.complete in the finally { block to make sure the result always gets resolved, even when there is an unexpected error during your callback.
Yes, you're right.
That's the correct behaviour of Mongo async driver (see MongoIterable.into).
However, Why don't you use sync driver in this situation? Is there any reason to use async method?
The program I am working on has a distributed architecture, more precisely the Broker-Agent Pattern. The broker will send messages to its corresponding agent in order to tell the agent to execute a task. Each message sent contains the target task information(the task name, configuration properties needed for the task to perform etc.). In my code, each task in the agent side is implemente in a seperate class. Like :
public class Task1 {}
public class Task2 {}
public class Task3 {}
...
Messages are in JSON format like:
{
"taskName": "Task1", // put the class name here
"config": {
}
}
So what I need is to associate the message sent from the broker with the right task in the agent side.
I know one way is to put the target task class name in the message so that the agent is able to create an instance of that task class by the task name extracted from the message using reflections, like:
Class.forName(className).getConstructor(String.class).newInstance(arg);
I want to know what is the best practice to implement this association. The number of tasks is growing and I think to write string is easy to make mistakes and not easy to maintain.
If you're that specific about classnames you could even think about serializing task objects and sending them directly. That's probably simpler than your reflection approach (though even tighter coupled).
But usually you don't want that kind of coupling between Broker and Agent. A broker needs to know which task types there are and how to describe the task in a way that everybody understands (like in JSON). It doesn't / shouldn't know how the Agent implements the task. Or even in which language the Agent is written. (That doesn't mean that it's a bad idea to define task names in a place that is common to both code bases)
So you're left with finding a good way to construct objects (or call methods) inside your agent based on some string. And the common solution for that is some form of factory pattern like: http://alvinalexander.com/java/java-factory-pattern-example - also helpful: a Map<String, Factory> like
interface Task {
void doSomething();
}
interface Factory {
Task makeTask(String taskDescription);
}
Map<String, Factory> taskMap = new HashMap<>();
void init() {
taskMap.put("sayHello", new Factory() {
#Override
public Task makeTask(String taskDescription) {
return new Task() {
#Override
public void doSomething() {
System.out.println("Hello" + taskDescription);
}
};
}
});
}
void onTask(String taskName, String taskDescription) {
Factory factory = taskMap.get(taskName);
if (factory == null) {
System.out.println("Unknown task: " + taskName);
}
Task task = factory.makeTask(taskDescription);
// execute task somewhere
new Thread(task::doSomething).start();
}
http://ideone.com/We5FZk
And if you want it fancy consider annotation based reflection magic. Depends on how many task classes there are. The more the more effort to put into an automagic solution that hides the complexity from you.
For example above Map could be filled automatically by adding some class path scanning for classes of the right type with some annotation that holds the string(s). Or you could let some DI framework inject all the things that need to go into the map. DI in larger projects usually solves those kinds of issues really well: https://softwareengineering.stackexchange.com/questions/188030/how-to-use-dependency-injection-in-conjunction-with-the-factory-pattern
And besides writing your own distribution system you can probably use existing ones. (And reuse rather then reinvent is a best practice). Maybe http://www.typesafe.com/activator/template/akka-distributed-workers or more general http://twitter.github.io/finagle/ work in your context. But there are way too many other open source distributed things that cover different aspects to name all the interesting ones.
There are countless questions here, how to solve the "could not initialize proxy" problem via eager fetching, keeping the transaction open, opening another one, OpenEntityManagerInViewFilter, and whatever.
But is it possible to simply tell Hibernate to ignore the problem and pretend the collection is empty? In my case, not fetching it before simply means that I don't care.
This is actually an XY problem with the following Y:
I'm having classes like
class Detail {
#ManyToOne(optional=false) Master master;
...
}
class Master {
#OneToMany(mappedBy="master") List<Detail> details;
...
}
and want to serve two kinds of requests: One returning a single master with all its details and another one returning a list of masters without details. The result gets converted to JSON by Gson.
I've tried session.clear and session.evict(master), but they don't touch the proxy used in place of details. What worked was
master.setDetails(nullOrSomeCollection)
which feels rather hacky. I'd prefer the "ignorance" as it'd be applicable generally without knowing what parts of what are proxied.
Writing a Gson TypeAdapter ignoring instances of AbstractPersistentCollection with initialized=false could be a way, but this would depend on org.hibernate.collection.internal, which is surely no good thing. Catching the exception in the TypeAdapter doesn't sound much better.
Update after some answers
My goal is not to "get the data loaded instead of the exception", but "how to get null instead of the exception"
I
Dragan raises a valid point that forgetting to fetch and returning a wrong data would be much worse than an exception. But there's an easy way around it:
do this for collections only
never use null for them
return null rather than an empty collection as an indication of unfetched data
This way, the result can never be wrongly interpreted. Should I ever forget to fetch something, the response will contain null which is invalid.
You could utilize Hibernate.isInitialized, which is part of the Hibernate public API.
So, in the TypeAdapter you can add something like this:
if ((value instanceof Collection) && !Hibernate.isInitialized(value)) {
result = new ArrayList();
}
However, in my modest opinion your approach in general is not the way to go.
"In my case, not fetching it before simply means that I don't care."
Or it means you forgot to fetch it and now you are returning wrong data (worse than getting the exception; the consumer of the service thinks the collection is empty, but it is not).
I would not like to propose "better" solutions (it is not topic of the question and each approach has its own advantages), but the way that I solve issues like these in most use cases (and it is one of the ways commonly adopted) is using DTOs: Simply define a DTO that represents the response of the service, fill it in the transactional context (no LazyInitializationExceptions there) and give it to the framework that will transform it to the service response (json, xml, etc).
What you can try is a solution like the following.
Creating an interface named LazyLoader
#FunctionalInterface // Java 8
public interface LazyLoader<T> {
void load(T t);
}
And in your Service
public class Service {
List<Master> getWithDetails(LazyLoader<Master> loader) {
// Code to get masterList from session
for(Master master:masterList) {
loader.load(master);
}
}
}
And call this service like below
Service.getWithDetails(new LazyLoader<Master>() {
public void load(Master master) {
for(Detail detail:master.getDetails()) {
detail.getId(); // This will load detail
}
}
});
And in Java 8 you can use Lambda as it is a Single Abstract Method (SAM).
Service.getWithDetails((master) -> {
for(Detail detail:master.getDetails()) {
detail.getId(); // This will load detail
}
});
You can use the solution above with session.clear and session.evict(master)
I have raised a similar question in the past (why dependent collection isn't evicted when parent entity is), and it has resulted an answer which you could try for your case.
The solution for this is to use queries instead of associations (one-to-many or many-to-many). Even one of the original authors of Hibernate said that Collections are a feature and not an end-goal.
In your case you can get better flexibility of removing the collections mapping and simply fetch the associated relations when you need them in your data access layer.
You could create a Java proxy for every entity, so that every method is surrounded by a try/catch block that returns null when a LazyInitializationException is catched.
For this to work, all your entities would need to implement an interface and you'd need to reference this interface (instead of the entity class) all throughout your program.
If you can't (or just don't want) to use interfaces, then you could try to build a dynamic proxy with javassist or cglib, or even manually, as explained in this article.
If you go by common Java proxies, here's a sketch:
public static <T> T ignoringLazyInitialization(
final Object entity,
final Class<T> entityInterface) {
return (T) Proxy.newProxyInstance(
entityInterface.getClassLoader(),
new Class[] { entityInterface },
new InvocationHandler() {
#Override
public Object invoke(
Object proxy,
Method method,
Object[] args)
throws Throwable {
try {
return method.invoke(entity, args);
} catch (InvocationTargetException e) {
Throwable cause = e.getTargetException();
if (cause instanceof LazyInitializationException) {
return null;
}
throw cause;
}
}
});
}
So, if you have an entity A as follows:
public interface A {
// getters & setters and other methods DEFINITIONS
}
with its implementation:
public class AImpl implements A {
// getters & setters and other methods IMPLEMENTATIONS
}
Then, assuming you have a reference to the entity class (as returned by Hibernate), you could create a proxy as follows:
AImpl entityAImpl = ...; // some query, load, etc
A entityA = ignoringLazyInitialization(entityAImpl, A.class);
NOTE 1: You'd need to proxy collections returned by Hibernate as well (left as an excersice to the reader) ;)
NOTE 2: Ideally, you should do all this proxying stuff in a DAO or in some type of facade, so that everything is transparent to the user of the entities
NOTE 3: This is by no means optimal, since it creates a stacktrace for every access to an non-initialized field
NOTE 4: This works, but adds complexity; consider if it's really necessary.