Zookeeper missed events on successive changes - java

I currently have a setup with a single zookeeper node and Curator to access the data. Reading data is done through a Curator TreeCache.
I have the following test:
public void test_callback_successive_changes_success_global_new_version() throws InterruptedException {
ZookeeperTestsHelper.createNewNodeInZookeeperSingleCommand("/my/path/new_node", "some string4", curator);
ZookeeperTestsHelper.createNewNodeInZookeeperSingleCommand("/my/path/new_node", "some string5", curator);
assertThat(<check what events the listener heard>);
Note that "new_node" does not exists before the test is executed.
public static void createNewNodeInZookeeperSingleCommand(final String fullPathOfValueInZookeeper, final String valueToSet, final CuratorFramework curator) {
try {
if (curator.checkExists().forPath(fullPathOfValueInZookeeper) != null) {
System.out.println("E: " + valueToSet);
//Node already exists just set it
curator.setData().forPath(fullPathOfValueInZookeeper, valueToSet.getBytes());
} else {
System.out.println("N: " + valueToSet);
//Node needs to be created
curator.create().creatingParentsIfNeeded().forPath(fullPathOfValueInZookeeper, valueToSet.getBytes());
} catch (Exception e) {
throw new IllegalStateException("Error", e);
I'm expecting that the cache listener will first heard the event of a node added with "some string 4" and then heard another event of node updated with "some string 5".
Instead I am only receiving the event for a node added with value "some string 5"
Looking at the logs both commands are being executed. i.e. "N: some string 4" and "E: some string 5" are both logged. And the final value in Zookeeper is correct ("some string 5") but I do not understand why the Curator cache is only seeing a single event?

ZooKeeper (or Curator's TreeCache) don't guarantee that you will not miss events on successive changes. The guarantee is that you won't see successive changes in a different order from what other clients would see.


FirestoreException: Backend ended Listen Stream

I'm trying to use Firestore in order to set up realtime listeners for a collection. Whenever a document is added, modified, or deleted in a collection, I want the listener to be called. My code is currently working for one collection, but when I try the same code on a larger collection, it fails with the error:
Listen failed: com.google.cloud.firestore.FirestoreException: Backend ended Listen stream: The datastore operation timed out, or the data was temporarily unavailable.
Here's my actual listener code:
* Sets up a listener at the given collection reference. When changes are made in this collection, it writes a flat
* text file for import into backend.
* #param collectionReference The Collection Reference that we want to listen to for changes.
public static void listenToCollection(CollectionReference collectionReference) {
AtomicBoolean initialUpdate = new AtomicBoolean(true);
System.out.println("Initializing listener for: " + collectionReference.getId());
collectionReference.addSnapshotListener(new EventListener<QuerySnapshot>() {
public void onEvent(#Nullable QuerySnapshot queryDocumentSnapshots, #Nullable FirestoreException e) {
// Error Handling
if (e != null) {
System.err.println("Listen failed: " + e);
// If this is the first time this function is called, it's simply reading everything in the collection
// We don't care about the initial value, only the updates, so we simply ignore the first call
if (initialUpdate.get()) {
System.out.println("Initial update complete...\nListener active for " + collectionReference.getId() + "...");
// A document has changed, propagate this back to backend by writing text file.
for (DocumentChange dc : queryDocumentSnapshots.getDocumentChanges()) {
String docId = dc.getDocument().getId();
Map<String, Object> docData = dc.getDocument().getData();
String folderPath = createFolderPath(collectionReference, docId, docData);
switch (dc.getType()) {
case ADDED:
System.out.println("Document Created: " + docId);
writeMapToFile(docData, folderPath, "CREATE");
System.out.println("Document Updated: " + docId);
writeMapToFile(docData, folderPath, "UPDATE");
System.out.println("Document Deleted: " + docId);
writeMapToFile(docData, folderPath, "DELETE");
It seems to me that the collection is too large, and the initial download of the collection is timing out. Is there some sort of work around I can use in order to get updates to this collection in real time?
I reached out to the Firebase team, and they're currently getting back to me on the issue. In the meantime, I was able to reduce the size of my listener by querying the collection based on a Last Updated timestamp attribute. I only looked at documents that were recently updated, and had my app change this attribute whenever a change was made.

Apache Kafka System Error Handling

We are trying to implement Kafka as our message broker solution. We are deploying our Spring Boot microservices in IBM BLuemix, whose internal message broker implementation is Kafka version 0.10. Since my experience is more on the JMS, ActiveMQ end, I was wondering what should be the ideal way to handle system level errors in the java consumers?
Here is how we have implemented it currently
Consumer properties
We are using the default properties for
Kafka Consumer
We are spinning up 3 threads per topic all having the same groupId, i.e one KafkaConsumer instance per thread. We have only one partition as of now. The consumer code looks like this in the constructor of the thread class
kafkaConsumer = new KafkaConsumer<String, String>(properties);
final List<String> topicList = new ArrayList<String>();
kafkaConsumer.subscribe(topicList, new ConsumerRebalanceListener() {
public void onPartitionsRevoked(final Collection<TopicPartition> partitions) {
public void onPartitionsAssigned(final Collection<TopicPartition> partitions) {
try {
logger.info("Partitions assigned, consumer seeking to end.");
for (final TopicPartition partition : partitions) {
final long position = kafkaConsumer.position(partition);
logger.info("current Position: " + position);
logger.info("Seeking to end...");
logger.info("Seek from the current position: " + kafkaConsumer.position(partition));
kafkaConsumer.seek(partition, position);
logger.info("Consumer can now begin consuming messages.");
} catch (final Exception e) {
logger.error("Consumer can now begin consuming messages.");
The actual reading happens in the run method of the thread
try {
// Poll on the Kafka consumer every second.
final ConsumerRecords<String, String> records = kafkaConsumer.poll(1000);
// Iterate through all the messages received and print their
// content.
for (final TopicPartition partition : records.partitions()) {
final List<ConsumerRecord<String, String>> partitionRecords = records.records(partition);
logger.info("consumer is alive and is processing "+ partitionRecords.size() +" records");
for (final ConsumerRecord<String, String> record : partitionRecords) {
logger.info("processing topic "+ record.topic()+" for key "+record.key()+" on offset "+ record.offset());
final Class<? extends Event> resourceClass = eventProcessors.getResourceClass();
final Object obj = converter.convertToObject(record.value(), resourceClass);
if (obj != null) {
logger.info("Event: " + obj + " acquired by " + Thread.currentThread().getName());
final CommsEvent event = resourceClass.cast(converter.convertToObject(record.value(), resourceClass));
final MessageResults results = eventProcessors.processEvent(event
if ("Success".equals(results.getStatus())) {
// commit the processed message which changes
// the offset
logger.info("Message processed sucessfully");
} else {
kafkaConsumer.seek(new TopicPartition(record.topic(), record.partition()), record.offset());
logger.error("Error processing message : {} with error : {},resetting offset to {} ", obj,results.getError().getMessage(),record.offset());
// TODO add return
} catch (final Exception e) {
logger.error("Consumer has failed with exception: " + e, e);
You will notice the EventProcessor which is a service class which processes each record, in most cases commits the record in database. If the processor throws an error (System Exception or ValidationException) we do not commit but programatically set the seek to that offset, so that subsequent poll will return from that offset for that group id.
The doubt now is that, is this the right approach? If we get an error and we set the offset then until that is fixed no other message is processed. This might work for system errors like not able to connect to DB, but if the problem is only with that event and not others to process this one record we wont be able to process any other record. We thought of the concept of ErrorTopic where when we get an error the consumer will publish that event to the ErrorTopic and in the meantime it will keep on processing other subsequent events. But it looks like we are trying to bring in the design concepts of JMS (due to my previous experience) into kafka and there may be better way to solve error handling in kafka. Also reprocessing it from error topic may change the sequence of messages which we don't want for some scenarios
Please let me know how anyone has handled this scenario in their projects following the Kafka standards.
if the problem is only with that event and not others to process this one record we wont be able to process any other record
that's correct and your suggestion to use an error topic seems a possible one.
I also noticed that with your handling of onPartitionsAssigned you essentially do not use the consumer committed offset, as you seem you'll always seek to the end.
If you want to restart from the last succesfully committed offset, you should not perform a seek
Finally, I'd like to point out, though it looks like you know that, having 3 consumers in the same group subscribed to a single partition - means that 2 out of 3 will be idle.

Android: Parse.com concurrency issue with findInBackground()

I am using Parse.com as a backend for my app. The local database from Parse seems to be very easy to use, so I decided to use it.
I want to create a database with Name and PhoneNumber. That is easy, just make a new ParseObject and pinInBackground(). But it is more complicated when I want to remove duplicate numbers. First I need to search if the number already exists in the database and then add the new number if it doesn't exists.
The method to do this is:
public void putPerson(final String name, final String phoneNumber, final boolean isFav) {
// Verify if there is any person with the same phone number
ParseQuery<ParseObject> query = ParseQuery.getQuery(ParseClass.PERSON_CLASS);
query.whereEqualTo(ParseKey.PERSON_PHONE_NUMBER_KEY, phoneNumber);
query.findInBackground(new FindCallback<ParseObject>() {
public void done(List<ParseObject> personList,
ParseException e) {
if (e == null) {
if (personList.isEmpty()) {
// If there is not any person with the same phone number add person
ParseObject person = new ParseObject(ParseClass.PERSON_CLASS);
person.put(ParseKey.PERSON_NAME_KEY, name);
person.put(ParseKey.PERSON_PHONE_NUMBER_KEY, phoneNumber);
person.put(ParseKey.PERSON_FAVORITE_KEY, isFav);
} else {
Log.d(TAG, "Warning: " + "Person with the number " + phoneNumber + " already exists.");
} else {
Log.d(TAG, "Error: " + e.getMessage());
Lets say I want to add 3 persons in the database:
ParseLocalDataStore.getInstance().putPerson("Jack", "0741234567", false);
ParseLocalDataStore.getInstance().putPerson("John", "0747654321", false);
ParseLocalDataStore.getInstance().putPerson("Jack", "0741234567", false);
ParseLocalDataStore.getInstance().getPerson(); // Get all persons from database
Notice that first and third person have the same number so the third souldn't be added to database, but...
The logcat after this is:
12-26 15:37:55.424 16408-16408/D/MGParseLocalDataStore: Person:0741234567 was added.
12-26 15:37:55.424 16408-16408/D/MGParseLocalDataStore: Person:0747654321 was added.
12-26 15:37:55.484 16408-16408/D/MGParseLocalDataStore: Person:0741234567 was added.
12-26 15:37:55.494 16408-16408/D/MGParseLocalDataStore: Person database is empty
The last line from logcat is from the method that shows me all persons from database:
public void getPerson() {
ParseQuery<ParseObject> query = ParseQuery.getQuery(ParseClass.PERSON_CLASS);
query.findInBackground(new FindCallback<ParseObject>() {
public void done(List<ParseObject> personList,
ParseException e) {
if (e == null) {
if (personList.isEmpty()) {
Log.d(TAG, "Person database is empty");
} else {
for (ParseObject p : personList) {
Log.d(TAG, p.getString(ParseKey.PERSON_PHONE_NUMBER_KEY));
} else {
Log.d(TAG, "Error: " + e.getMessage());
So there are 2 problems:
The third number is added even if I checked if already exists.
The method that shows me all persons tell me I have nothing in my database even if in logcat I can see it added 3 persons.
I think the problem is findInBackground() method that does all the job in another thread.
Is there any solution to this problem?
Both of your problems are a result of asynchronous work. If you call the putPerson method twice, they will both run near-simultaneously in separate background threads and both find-queries will most likely return almost at the same time, and definitely before the first call has saved the new person.
In your example, the getPerson call will return before the background threads have been able to save your three people as well.
Your problem is not really related to Parse or localDataStore, but is a typical concurrency issue. You need to rethink how you handle concurrency in your app.
As long as this is only a local issue, you can impose synchronous structure with i.e. the Bolts Framework (which is already a part of your app since you're using Parse). But if calls to addPerson is done in multiple places, you will always face this problem and you'd have to find other solutions or workarounds to handle concurrency.
Concurrency is a big topic which you should spend some time studying.

Using a Commonj Work Manager to send Asynchronous HTTP calls

I switched from making sequential HTTP calls to 4 REST services, to making 4 simultaneous calls using a commonj4 work manager task executor. I'm using WebLogic 12c. This new code works on my development environment, but in our test environment under load conditions, and occasionally while not under load, the results map is not populated with all of the results. The logging suggests that each work item did receive back the results though. Could this be a problem with the ConcurrentHashMap? In this example from IBM, they use their own version of Work and there's a getData() method, although it doesn't like that method really exists in their class definition. I had followed a different example that just used the Work class but didn't demonstrate how to get the data out of those threads into the main thread. Should I be using execute() instead of schedule()? The API doesn't appear to be well documented. The stuckthreadtimeout is sufficiently high. component.processInbound() actually contains the code for the HTTP call, but I the problem isn't there because I can switch back to the synchronous version of the class below and not have any issues.
My code:
public class WorkManagerAsyncLinkedComponentRouter implements
MessageDispatcher<Object, Object> {
private List<Component<Object, Object>> components;
protected ConcurrentHashMap<String, Object> workItemsResultsMap;
protected ConcurrentHashMap<String, Exception> componentExceptionsInThreads;
//components is populated at this point with one component for each REST call to be made.
public Object route(final Object message) throws RouterException {
try {
workItemsResultsMap = new ConcurrentHashMap<String, Object>();
componentExceptionsInThreads = new ConcurrentHashMap<String, Exception>();
final String parentThreadID = Thread.currentThread().getName();
List<WorkItem> producerWorkItems = new ArrayList<WorkItem>();
for (final Component<Object, Object> component : this.components) {
producerWorkItems.add(workManagerTaskExecutor.schedule(new Work() {
public void run() {
//ExecuteThread th = (ExecuteThread) Thread.currentThread();
LOG.info("Child thread " + Thread.currentThread().getName() +" Parent thread: " + parentThreadID + " Executing work item for: " + component.getName());
try {
Object returnObj = component.processInbound(message);
if (returnObj == null)
LOG.info("Object returned to work item is null, not adding to producer components results map, for this producer: "
+ component.getName());
else {
LOG.info("Added producer component thread result for: "
+ component.getName());
workItemsResultsMap.put(component.getName(), returnObj);
LOG.info("Finished executing work item for: " + component.getName());
} catch (Exception e) {
componentExceptionsInThreads.put(component.getName(), e);
} // end loop over producer components
// Block until all items are done
workManagerTaskExecutor.waitForAll(producerWorkItems, stuckThreadTimeout);
LOG.info("Finished waiting for all producer component threads.");
if (componentExceptionsInThreads != null
&& componentExceptionsInThreads.size() > 0) {
List<Object> resultsList = new ArrayList<Object>(workItemsResultsMap.values());
if (resultsList.size() == 0)
throw new RouterException(
"The producer thread results are all empty. The threads were likely not created. In testing this was observed when either 1)the system was almost out of memory (Perhaps the there is not enough memory to create a new thread for each producer, for this REST request), or 2)Timeouts were reached for all producers.");
//** The problem is identified here. The results in the ConcurrentHashMap aren't the number expected .
if (workItemsResultsMap.size() != this.components.size()) {
StringBuilder sb = new StringBuilder();
for (String str : workItemsResultsMap.keySet()) {
sb.append(str + " ");
throw new RouterException(
"Did not receive results from all threads within the thread timeout period. Only retrieved:"
+ sb.toString());
LOG.info("Returning " + String.valueOf(resultsList.size()) + " results.");
LOG.debug("List of returned feeds: " + String.valueOf(resultsList));
return resultsList;
I ended up cloning the DOM document used as a parameter. There must be some downstream code that has side effects on the parameter.

Where to put index-re-aliasing when re-indexing in the background?

I try to re-index an ES index with Java:
// reindex all documents from the old into the new index
QueryBuilder qb = QueryBuilders.matchAllQuery();
SearchResponse scrollResp = client.prepareSearch("my_index").setSearchType(SearchType.SCAN).setScroll(new TimeValue(600000)).setQuery(qb).setSize(100).execute().actionGet();
while (true) {
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
final int documentFoundCount = scrollResp.getHits().getHits().length;
// Break condition: No hits are returned
if (documentFoundCount == 0) {
// otherwise add all documents which are found (in this scroll-search) to a bulk operation for reindexing.
logger.info("Found {} documents in the scroll search, re-indexing them via bulk now.", documentFoundCount);
BulkRequestBuilder bulk = client.prepareBulk();
for (SearchHit hit : scrollResp.getHits()) {
bulk.add(new IndexRequest(newIndexName, hit.getType()).source(hit.getSource()));
bulk.execute(new ActionListener<BulkResponse>() {
#Override public void onResponse(BulkResponse bulkItemResponses) {
logger.info("Reindexed {} documents from '{}' to '{}'.", bulkItemResponses.getItems().length, currentIndexName, newIndexName);
#Override public void onFailure(Throwable e) {
logger.error("Could not complete the index re-aliasing.", e);
// these following lines should only be executed if the re-indexing was successful for _all_ documents.
logger.info("Finished re-indexing all documents, now setting the aliases from the old to the new index.");
try {
client.admin().indices().aliases(new IndicesAliasesRequest().removeAlias(currentIndexName, "my_index").addAlias("my_index", newIndexName)).get();
// finally, delete the old index
client.admin().indices().delete(new DeleteIndexRequest(currentIndexName)).actionGet();
} catch (InterruptedException | ExecutionException e) {
logger.error("Could not complete the index re-aliasing.", e);
In general, this works, but the approach has one problem:
If there is a failure during re-indexing, e.g. it takes too long and is stopped by some transaction watch (it runs during EJB startup), the alias is re-set and the old index is nevertheless removed.
How can I do that alias-re-setting if and only if all bulk requests were successful?
You're not waiting until the bulk request finishes. If you call execute() without actionGet(), you end up running asynchronously. Which means you will start changing aliases and deleting indexes before the new index is completely built.
client.admin().indices().aliases(new IndicesAliasesRequest().removeAlias(currentIndexName, "my_index").addAlias("my_index", newIndexName)).get();
This should be ended with execute().actionGet() and not get(). that is probably why your alias is not getting set

