I have a hazelcast cluster that performs several calculations for a java-client triggered by command-line. I need to persist parts of the calculated results on the client-system while the nodes are still working. I am going to store parts of the data in Hazelcasts maps. Now I am looking for a way to inform the client that a node have stored data inside the map and that he can start using it. Is there a way to trigger client operations from any hazelcast-node?
Your question is not very clear, but it looks like you could use com.hazelcast.core.EntryListener to trigger a callback that will notify the client, when a new entry is stored in the data map.
You member node can publish some intermediate results (or just notification message) to Hazelcast IQueue, ITopic or RingBuffer.
A flow looks like this.
a client registers a listener for, say, rignbuffer.
a client submits command to perform on the cluster.
a member persists intermediate results to the IMaps or any other data structure
a member sends a message to the topic about the availability of partial results.
a client receives messages and accessing data in IMap.
a member sends a message when it's done with it's task.
Something like that.
You can find some examples here
Let me know if you have any questions about it.
Cheers,
Vik
There are several paths to solve the problem. The most simple one is using a dedicated IMap or any other of Hazelcasts synchronized collections. One can simply write data in such a map and retrieve/remove it after it got added. But this will cause a huge Overhead, because the data has to be synchronized throughout the cluster. If the data is quite big and the cluster is huge with a few hundred nodes all over the world or at least the USA, the data will be synchronized over all nodes just to get deleted a few moments later, which also has to be synchronized. Not deleting is no option, because the data can get several gb big, which will make synchronization of the data even more expensive. The question got answered but the solution is not suited for every scenario.
Related
I have a network of nodes represented with a graph ( or more specifically a dag). The vertex and edges are just ids pointing to large objects in the cache.
I am using hazelcast and have 2 caches:
1. ReferenceObject for the graph
2. IMap for the large objects
3. IMap for the edges in the graph
When I insert a large object, I have an entry listener that will update this graph in the cache. Similarly when I add an edge data, there is also an entry listener that will update the graph.
However I have 1 problem where if I create an edge and it creates cycle, it fails (as it's a dag) but the IMap retains the records.
Any ideas how I can have transactions across the main thread and the entry listener?
#Pilo, the problem is EntryListener listens to events fired after data already populated in the map. So when you insert the data to your first map & listen to an update event, data is already in the first map.
You can either
Manually remove the record from the first map if the operation fails on the second one.
Use transactions & make sure either all or none of the maps updated, instead of using listeners.
Though it's a completely different approach, have you looked at Hazelcast Jet? It's a DAG based event stream processing engine build on top of Hazelcast IMDG. It might fit your use case better and take care of the lower level stuff for you.
https://jet.hazelcast.org
You would have a Jet cluster, which is also a Hazelcast cluster, but you get all the processing stuff with it. It extends the Java Streams programming model so you just write your app as if it was a Java stream and run it on the cluster. Something to think about anyway.
I've used GigaSpaces in the past and I'd like to know if I can use Ignite in a similar fashion. Specifically, I need to implement a master-worker pattern where one set of process writes objects to the in-memory data grid and another set reads those objects, does some processing, and possibly writes results back to the grid. One important GigaSpaces/JavaSpaces feature I need is leasing. If I write an object to the space and it isn't picked up within a certain time period, it should automatically expire and I should get some kind of notification.
Is Apache Ignite a good match for this use case?
I've worked with GigaSpaces before. What you are looking for is perhaps "continuous queries" in Ignite. That would allow create a filter for a specific predicate I.e. Checking a field of a new object being written to the grid. Once the filter is evaluated it will trigger a listener that can execute the logic you require and write results or changes back to the grid. You can create as many of these queries as desired and create chains. Similar to the "notification container" in gigaspaces. And as you would expect you can control the thread pools for this separately.
As for master worker pattern, you can configure client Ignite nodes to be written the the data and server nodes to store and process the data. You can even use other client nodes as remote listeners for data changes as you mentioned.
Check these links:
https://apacheignite.readme.io/docs/continuous-queries
https://apacheignite.readme.io/docs/clients-vs-servers
I've worked with GigaSpaces before. What you are looking for is perhaps "continuous queries" in Ignite. That would allow create a filter for a specific predicate I.e. Checking a field of a new object being written to the grid. Once the filter is evaluated it will trigger a listener that can execute the logic you require and write results or changes back to the grid. You can create as many of these queries as desired and create chains. Similar to the "notification container" in gigaspaces. And as you would expect you can control the thread pools for this separately.
As for master worker pattern, you can configure client Ignite nodes to be written the the data and server nodes to store and process the data. You can even use other client nodes as remote listeners for data changes as you mentioned.
We are working on a distributed data processing system, and Hazelcast is one of the component we are using.
We have a streaming data input coming to the cluster and we have to process the data (update/accumulate etc). There is distributed request map, which has local entry listeners. We process a new request (update/accumulate in memory) and put to another distributed map, which is the actual datagrid.
Thus we can process each request concurrently without locking. However, the putting of data to the main datagrid might involve a network trip.
Is there a way I can force specify which node to be selected? Basically I would want to put it in the local map for datagrid. This should improve the overall throughput by avoiding the network trip.
By using a partition aware key, I can specify that all such keys go to the same partition, however, I am looking to actually 'specify' the partition. Is this possible?
You can create a key for a specific partition. We do this often for testing.
Once you have created such a key for every partition, you can use
map.put("yourkey#partitionkey", value)
Checkout out the git repo and look for HazelcastTestSupport.generateKeyOwnedBy(hz).
Important: it can be that a partition belongs to a member at some point in time, but a partition can move around in the system. E.g. when member joins/leaves the cluster, so the solution could be fragile.
I am looking for a good design pattern for sharding a list in Google App Engine. I have read about and implemented sharded counters as described in the Google Docs here but I am now trying to apply the same principle to a list. Below is my problem and possible solution - please can I get your input?
Problem:
A user on my system could receive many messages kind of like a online chat system. I'd like the server to record all incoming messages (they will contain several fields - from, to, etc). However, I know from the docs that updating the same entity group often can result in an exception caused by datastore contention. This could happen when one user receives many messages in a short time thus causing his entity to be written to many times. So what about abstracting out the sharded counter example above:
Define say five entities/entity groups
for each message to be added, pick one entity at random and append the message to it writing it back to the store,
To get list of messages, read all entities in and merge...
Ok some questions on the above:
Most importantly, is this the best way to go about things or is there a more elegant/more efficient design pattern?
What would be a efficient way to filter the list of messages by one of the fields say everything after a certain date?
What if I require a sharded set instead? Should I read in all entities and check if the new item already exists on every write? Or just add it as above and then remove duplicates whenever the next request comes in to read?
why would you want to put all messages in 1 entity group ?
If you don't specify a ancestor, you won't need sharding, but the end user might see some lagging when querying the messages due to eventual consistency.
Depends if that is an acceptable tradeoff.
Inside my system I have data with a short lenght of life, it means that the data is still actuall not for a long time but shold be persisted in data store.
Also this data may be changed frequently for each user, for instance each minute.
Potentially amount of users maybe large enough and I want to speed up the put/get process of this data by usage of memcache and delayed persist to the bigtable.
No problems just to put/get objects by keys. But for some use cases I need to retrieve all data from cache that still alive but api allows me to get data only by keys. Hence I need to have some key holder that knows all keys of the data inside memcache... But any object may be evicted and I need to remove this key from global registry of keys (but such listener doesn't work in GAE). To store all this objects in the list a map is not accaptable for my solution because each object should has it's own time to evict...
Could somebody recommend me in which way I should move?
It sounds like what you really are attempting to do is have some sort of queue for data that you will be persisting. Memcache is not a good choice for this since as you've said, it is not reliable (nor is it meant to be). Perhaps you would be better off using Task Queues?
Memcache isn't designed for exhaustive access, and if you need it, you're probably using it the wrong way. Memcache is a sharded hashtable, and as such really isn't designed to be enumerated.
It's not clear from your description exactly what you're trying to do, but it sounds like at the least you need to restructure your data so you're aware of the expected keys at the time you want to write it to the datastore.
Since I am encountering the very same problem, which I might solve by building a decorator function and wrap the evicting function around it so that key to the entity is automatically deleted from key directory/placeholder on memcache, i.e. when you call for eviction.
Something like this:
def decorate_evict_decorator(key_prefix):
def evict_decorator(evict):
def wrapper(self,entity_name_or_id):#use self if the function is bound to a class.
mem=memcache.Client()
placeholder=mem.get("placeholder")#could use gets with cas
#{"placeholder":{key_prefix+"|"+entity_name:key_or_id}}
evict(self,entity_name_or_id)
del placeholder[key_prefix+"|"+entity_name]
mem.set("placeholder",placeholder)
return wrapper
return evict_decorator
class car(db.Model):
car_model=db.StringProperty(required=True)
company=db.StringProperty(required=True)
color=db.StringProperty(required=True)
engine=db.StringProperty()
#classmethod
#decorate_evict_decorator("car")
evict(car_model):
#delete process
class engine(db.Model):
model=db.StringProperty(required=True)
cylinders=db.IntegerProperty(required=True)
litres=db.FloatProperty(required=True)
manufacturer=db.StringProperty(required=True)
#classmethod
#decorate_evict_decorator("engine")
evict(engine_model):
#delete process
You could improve on this according to your data structure and flow. And for more on decorators.
You might want to add a cron to keep your datastore in sync the memcache at a regular interval.