Neo4j, SDN4, ActiveMQ multiple consumers and data syncronization

Neo4j, SDN4, ActiveMQ multiple consumers and data syncronization - java

In order to speed up the data consumption in my application(Spring Boot, Neo4j database, Spring Data Neo4j 4) I have introduced Apache ActiveMQ and configured 10 concurrent consumers.
Right after that I ran into the issue with a counter updates.
I execute the following createDecisions method from my Apache ActiveMQ consumer :
#Override
public Decision create(String name, String description, String url, String imageUrl, boolean multiVotesAllowed, Long parentDecisionId, User user) {
Decision parentDecision = null;
if (parentDecisionId != null) {
parentDecision = ofNullable(findById(parentDecisionId)).orElseThrow(() -> new EntityNotFoundException("Parent decision with a given id not found"));
}
Decision decision = decisionRepository.save(new Decision(name, description, url, imageUrl, multiVotesAllowed, parentDecision, user), user);
if (parentDecision != null) {
updateTotalChildDecisions(parentDecision, 1);
}
return decision;
}
inside createDecision method I do some logic and then update parentDecision.totalChilDecisions counter:
#Override
public Decision updateTotalChildDecisions(Decision decision, Integer increment) {
decision.setTotalChildDecisions(decision.getTotalChildDecisions() + increment);
return decisionRepository.save(decision);
}
After execution in the concurrent environments this counter doesn't match the real things at database but in a single-threaded env(1 ActiveMQ consumer) everything works fine.
I think the main issue is that during totalChildDecisions update the parentDecision refers to the old SDN 4 object with not actual data(totalChildDecisions). How to correctly update parentDecision.totalChildDecisions ?
How to properly synchronize my code in order to get the counters working on the concurrent ActiveMQ consumers ?

Related

Apache Camel: access Route after polling

I'm using Camel JPA endpoints to poll a database and copy the data to a second one.
To not poll duplicates, I'm planning to save the highest ID of the copied data and only poll data with an ID higher than that one.
To save a few database writes, I want to write back the highest ID after the current polling / copying run is over, not for every single data element. I can access the element (and its ID) in the Camel Route class:
private Long startId = 0L;
private Long lastId = 0L;
from("jpa://Data").routeId("dataRoute")
.onCompletion().onCompleteOnly().process(ex -> {
if (lastId > startId) {
startId = lastId;
logger.info("New highest ID: {}", startId);
}
}).end()
.process(ex -> {
Data data = ex.getIn().getBody(Data.class);
lastId = data.getId();
NewData newData = (NewData) convertData(data);
ex.getMessage().setBody(newData);
}).to("jpa://NewData")
Now I want to save startId after the current polling is over. To do so, I overrode the PollingConsumerPollStrategy with my custom one where I want to access lastId inside the commit method (which gets executed exactly when I want it to, after the current polling is complete).
However, I can't access the route there. I tried via the route ID:
#Override
public void commit(Consumer consumer, Endpoint endpoint, int polledMessages) {
var route = (MyRoute) endpoint.getCamelContext().getRoute("dataRoute");
var lastId = route.getLastId();
log.debug("LastID: {}", lastId);
}
However I'm getting a class cast exception: DefaultRoute to MyRoute. Is it something about handing the ID to my route?

I would do it a bit differently.
Instead of using RouteBuilder instance vars for storing startId and lastId, you may also put these values as GlobalOptions (which is basically a map of key-value pairs) of current CamelContext.
This way, you can easily obtain their value using:
public void commit(Consumer consumer, Endpoint endpoint, int polledMessages) {
String lastId = endpoint.getCamelContext().getGlobalOption("lastId");
}
It is also (theoretically) a better implementation because it also supports potential concurrent executions, as the id are shared for all instances running in the context.

Seeking to a Kafka Offset with Spring Cloud Stream

I have an event-sourced service that listens to a Kafka topic and saves state in a relational DB.
Considering a suitable restoration strategy for this service (i.e. how to restore the DB in a disaster recovery scenario), one option would be to save the current offset in the DB, take snapshots, and restore from a snapshot. In this scenario the service would need to seek to the offset when started in 'restoration mode'.
I am using Spring Cloud Stream, and was wondering if the framework provides any mechanism for seeking to an offset?
I realise another option for restoration would be to simply play all the events from scratch, but that's not an ideal option for some of my microservices.

If you're talking disaster, what makes you think you can write anything to DB?
In other words you may end up dealing with de-duplication on at least one event (at least you have to account for that) and if so, then de-duplication is still something you have to deal with.
I understand your concern with re-play (you simply don't want to reply from the beginning, but you can store periodic snapshots which would ensure you have a relatively fixed amount of events hat may need to be reprocessed/de-dupped.
That said, Kafka maintains the current offset, so you can rely on natural transaction features of Kafka to ensure that the next time you start your microservice it will begin from the last un-processed (successfully) offset.

There is KafkaBindingRebalanceListener interface that you can use
#Slf4j
#Component
public class KafkaRebalanceListener implements KafkaBindingRebalanceListener {
#Value("${config.kafka.topics.offsets:null}")
private String topicOffsets;
#Override
public void onPartitionsAssigned(String bindingName, Consumer<?, ?> consumer, Collection<TopicPartition> partitions, boolean initial) {
if (topicOffsets != null && initial) {
final Optional<Map<TopicPartition, Long>> offsetsOptional = parseOffset(topicOffsets);
if (offsetsOptional.isPresent()) {
final Map<TopicPartition, Long> offsetsMap = offsetsOptional.get();
partitions.forEach(tp -> {
if (offsetsMap.containsKey(tp)) {
final Long offset = offsetsMap.get(tp);
try {
log.info("Seek topic {} partition {} to offset {}", tp.topic(), tp.partition(), offset);
consumer.seek(tp, offset);
} catch (Exception e) {
log.error("Unable to set offset {} for topic {} and partition {}", offset, tp.topic(), tp.partition());
}
}
});
}
}
}
private Optional<Map<TopicPartition, Long>> parseOffset(String offsetParam) {
if (offsetParam == null || offsetParam.isEmpty()) {
return Optional.empty();
}
return Optional.of(Arrays.stream(offsetParam.split(","))
.flatMap(slice -> {
String[] items = slice.split("\\|");
String topic = items[0];
return Arrays.stream(Arrays.copyOfRange(items, 1, items.length))
.map(r -> {
String[] record = r.split(":");
int partition = Integer.parseInt(record[0]);
long offset = Long.parseLong(record[1]);
return new AbstractMap.SimpleEntry<>(new TopicPartition(topic, partition), offset);
});
}).collect(Collectors.toMap(AbstractMap.SimpleEntry::getKey, AbstractMap.SimpleEntry::getValue)));
}
}
config.kafka.topics.offsets field look like this but you can use any format
String topicOffsets = "topic2|1:100|2:120|3:140,topic3|1:1000|2:1200|3:1400";

Java8 CompletableFuture conditional chaining

I have read many java8 completable future tutorials, most of them are basically same. All talking about some basic method "thenAccept"/"thenApply"/thenCombine" to build a pipeline flow.
But when come to a real work problem, I feel hard to organize different completable futures from different Service. For Example:
interface Cache{
CompletableFuture<Bean> getAsync(long id);
CompletableFuture<Boolean> saveAsync(Bean bean);
}
interface DB{
Completable<Bean> getAsync(long id)
}
the service logic is quite simple, get data from Cache, if exist return to our client, if not, get it from DB, if exist save it back to Cache, and return it to our client, if neither exist in DB, return "error" to client.
using synchronize API, it will be quite straight ahead. But when using asyncnorized API, there are "many pipelines", manny conditional break. I can not figure out how to implement this using CompletableFuture API.

If you don't care about the result of saving into the cache and if you want to throw an exception on bean not found, then it can be e.g.
CompletableFuture<Bean> findBeanAsync(long id, Cache cache, DB db) {
return cache.getAsync(id).thenCompose(bean -> {
if (bean != null) {
return CompletableFuture.completedFuture(bean);
}
return db.getAsync(id).thenApply(dbBean -> {
if (dbBean == null) {
throw new RuntimeException("bean not found with id " + id);
}
cache.saveAsync(dbBean);
return dbBean;
});
});
}

Redis insertion lag for sorted sets

I am trying to push a small amount of data (about 50 bytes) from my application (written in Java using the jedis driver) into a sorted set with about 360 members (each member also contains a very small amount of data). I am experiencing a 60-90 second lag time between my application making the insert and seeing the result on my redis server (a separate server in a different data center). This happens consistently. At first I thought there was something in my application causing the query to hang and then execute a minute later, but that's not the case because I can shut my application server down entirely immediately after running the query and the new item still shows up a minute later in Redis. In addition, when I removed all elements of the set and tried the insert again, it happened immediately (which is the intended behavior).
This is entirely in a test environment with no other traffic hitting either server, and my Redis server hardly has any data in it and has plenty of ram and cpu. Latency between my application server and the redis server is approx ~50ms.
Is there a configuration setting that I'm missing that could be causing such a delay?
Thanks in advance.
EDIT: Here is my code for inserting
public void save(RedisEntity entity) {
String key = entity.getKey();
String value = entity.getValue();
Jedis jedis = database.getJedis();
try {
if (entity instanceof RedisListEntity) {
if (((RedisListEntity)entity).isSavePushesLeft()) {
jedis.lpush(key, value);
} else {
jedis.rpush(key, value);
}
} else if (entity instanceof RedisSortedSetEntity) {
Long score = ((RedisSortedSetEntity)entity).getScore();
jedis.zadd(key, score, value);
} else if (entity instanceof RedisSetEntity) {
jedis.sadd(key, value);
} else {
jedis.set(key, value);
}
} catch (JedisConnectionException e) {
raiseErrorForSave(key, value);
database.returnBrokenResourceToPool(jedis);
} finally {
database.returnResourceToPool(jedis);
}
}
And retrieval (although keep in mind the key doesn't show up in redis-cli on the redis server for 1-2 minutes after insertion, so this code was never even part of the problem):
public Set<String> getSortedSetMembersWithRangeRankRev(String key, double min, double max, int start, int size) {
Set<String> result = null;
Jedis jedis = database.getJedis();
try {
result = jedis.zrevrangeByScore(key, max, min, start, size);
} catch (JedisConnectionException e) {
logger.warn("Failed to retrieve set for key " + key);
logger.warn(e.getMessage());
database.returnBrokenResourceToPool(jedis);
} finally {
database.returnResourceToPool(jedis);
}
return result;
}
EDIT: Some more info - I restarted my Redis server after it had been on for a few days (with almost no traffic as it is in a development environment) and updates seem to be coming through instantaneously as the set reaches 1000 members. This problem is still troubling for me though and I would like to identify the cause and prevent it from happening in the future - until then I cannot release this into a production environment.

Akka actors and futures: Understanding by example

I'm trying to learn Akka actors and futures but after reading the docs at http://akka.io
and doing http://doc.akka.io/docs/akka/2.0.2/intro/getting-started-first-java.html
I'm still having some issues with understanding. I guess calculate the value of Pi
is a thing a lot of people can relate too, but not me =). I have search around a bit
but haven't found any examples that suits me. Therefore I thought that I would take some real-life code of mine and throw it in here and exchange it for an example of how to do this with Akka.
Ok so here we go:
I have an java play2 application where I need to take some data from my DB and index it in my elasticsearch instance.
I call the DB and get the ids for the venues.
I then split the list and create a couple of callable indextasks.
After that I invoke all tasks where each task collects the venues for the assigned ids
from the db.
For each venue index it to the elasticsearch instance and make it searchable.
Done.
Application.java:
public class Application extends Controller {
private static final int VENUE_BATCH = 1000;
private static int size;
public static Result index() {
List<Long> venueIds = DbService.getAllVenueIds();
size = venueIds.size();
Logger.info("Will index " + size + " items in total.");
ExecutorService service = Executors.newFixedThreadPool(getRuntime().availableProcessors());
int startIx = 0;
Collection<Callable<Object>> indexTasks = new ArrayList<Callable<Object>>();
do {
int endIx = Math.min(startIx + VENUE_BATCH, size);
List<Long> subList = venueIds.subList(startIx, endIx);
VenueIndexTask indexTask = new VenueIndexTask(subList);
indexTasks.add(indexTask);
} while ((startIx += VENUE_BATCH) < size);
Logger.info("Invoking all tasks!");
try {
service.invokeAll(indexTasks);
} catch (InterruptedException e) {
e.printStackTrace();
}
return ok(index.render("Done indexing."));
}
}
VenueTask:
public class VenueIndexTask implements Callable<Object> {
private List<Long> idSubList;
public VenueIndexTask(List<Long> idSubList){
this.idSubList = idSubList;
Logger.debug("Creating task which will index " + idSubList.size() + " items. " +
"Range: " + rangeAsString() + ".");
}
#Override
public Object call() throws Exception {
List<Venue> venues = DbService.getVenuesForIds(idSubList);
Logger.debug("Doing some indexing: "+venues.size());
for(Venue venue : venues) {
venue.index();
}
return null;
}
private String rangeAsString() {
return "[" + idSubList.get(0) + "-" + idSubList.get(idSubList.size() - 1) + "]";
}
}
Venue:
#IndexType(name = "venue")
public class Venue extends Index {
private String name;
// Find method static for request
public static Finder<Venue> find = new Finder<Venue>(Venue.class);
public Venue() {
}
public Venue(String id, String name) {
super.id = id;
this.name = name;
}
#Override
public Map toIndex() {
HashMap map = new HashMap();
map.put("id", super.id);
map.put("name", name);
return map;
}
#Override
public Indexable fromIndex(Map map) {
if (map == null) {
return this;
}
this.name = (String) map.get("name");
return this;
}
}
So all you Akka people out there go nuts! And please do as much as you can, propose cool futures functionality that could be used or any other knowledge/code that I could use to learn this stuff.

How I like to think of Akka (or any other message based systems) is to think like a conveyor belt, like in factories. A simplified way of thinking in Actors could be taking a pizza Order.
You, the hungry customer (Actor/Role) sends a order (A Message) to Pizza Shop
Customer service (Actor/Role) takes your order, gives you the order number (Future)
If you were impatient, you might've waited on the phone/internet/shop till you got your pizza (A synchronous/blocking transaction) otherwise you would be happy with the order number an check up on it later (non-blocking)
Customer service sends the message to the Chefs (Actor) under the supervision of Kitchen Manager (Actor). This is a very process heavy kitchen, with hierarchy. Akka likes that. See Supervision
Chef creates a new Pizza and attaches the details of the order (A new message) and passes that to the delivery boy (Actor) via the delivery manager (Supervisor Actor).
During this process, your order details haven't changed, that would be a nightmare. You would not be happy if you had pepperoni if you wanted plain cheese! All messages should be immutable! However, it may be that the message might be different for different actors. A delivery boy would expect a Pizza and the order details attached, the chef would expect an order. When a message needs to change, a new message is created.
Each actor is good at one role, how effective would it be if one guy had to do all the tasks? It may be that some actors out number others (e.q. 10 threads for Chefs, 2 for Delivery boys, 1 Customer Service).
Blocking behavior is a pain, imagine customer service is waiting for the chef and delivery boy before seeing the next customer?
Hopefully I've helped you a little, this is a huge topic and large change of mind. Good luck

Coursera currently runs a course on reactive programming which has the 3 last lectures on Akka and the actor model. This includes video lectures and homework (in Scala though not Java).
While you are too late to receive the full certificate, you can still join the course and just check the last three weeks.
https://class.coursera.org/reactive-001/class

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.