How to measure latency and throughput in a Storm topology

How to measure latency and throughput in a Storm topology - java

I'm learning Storm with the example ExclamationTopology. I want measure the latency (the time it takes to add !!! to a word) of a bolt and throughput (say, how many words pass through a bolt per second).
From here, I can count the number of words and how many times a bolt is executed:
_countMetric = new CountMetric();
_wordCountMetric = new MultiCountMetric();
context.registerMetric("execute_count", _countMetric, 5);
context.registerMetric("word_count", _wordCountMetric, 60);
I know that the Storm UI gives Process Latency and Execute Latency and this post gives a good explanation of what they are.
However, I want to log the latency of every execution of each bolt, and use this information along with the word_count to calculate the throughput.
How can I use Storm Metrics to accomplish this?

While your question is straight forward and will be surely for interest for many people, it`s answer is not as trivial as it should be. First of all, we need to clarify, what exactly we really want to measure. Throughput and Latency are terms, that can be easily be understood but in things get more complicated in Storms distributed environment.
As depicted in this excellent blog post, each Storm supervisor has at least 3 threads which fulfill different tasks. While the Worker Receiver Thread waits for incoming data tuples and aggregates them to a bulk, they are send to the Worker Executor Thread. This contains the user logic (in your case the ExclamationBolt and a sender that takes care of the outgoing messages. Finally, on every Supervisor Node, there is a Worker Send Thread that aggregates messages coming from all executors, aggregates them and send them to the network.
For sure, each of those threads has its own latency and throughput. For the Sender and Receiver Thread, they are largely depending on the buffer sizes, that you can adjust. In your case, you want just to measure latency and throughput of one (execution) bolt - this is possible, but keep in mind that those other threads have their effects on this bolt.
My approach:
To obtain latency and throughput, I used the old Storm Builtin Metrics. Because I found the documentation not very clear, I o draw a line here: we are not using the new Storm Metric API v2 and we are not using Cluster Metrics.
Activate the Storm Logging with placing the following in your storm.yaml:
topology.metrics.consumer.register:
- class: "org.apache.storm.metric.LoggingMetricsConsumer"
parallelism.hint: 1
You can set the reporting interval with: topology.builtin.metrics.bucket.size.secs: 10
Run your Query. All metrics are logged every 10 Seconds in a specific metrics-logfile. It is not trivial to find this logfile. Storm creates a LoggingMetricsConsumer-Bolt and distributes it among the cluster. On this node, you should find in the Storm logs the corresponding metric file.
This metric file contains for each executor the metrics, you are looking for, like: complete-latency, execute-latency and so on. For throughput, I would use the Queue Metrics that contains e.g.: arrival_rate_secs as an estimate of how many tuples are inserted per second. Take care of the multiple threads that are executed on every supervisor.
Good luck!

Related

Is it required to use jp#gc - Throughput Shaping Timer when you are using bzm - Concurrency Thread Group in JMeter?

Good day!
I'm using JMeter to do load testing. It's my first time to use this tool.
I'm confused with some aspect of JMeter.
I will be using bzm - Concurrency Thread Group to simulate traffic to the server. Based on documentation, it must be required to used it along with jp#gc - Throughput Shaping Timer.
However, I'm thinking not to use it. Will there be any problem in my during my test?
bzm - Concurrency Thread Group

Not necessarily.
Concurrency thread group is responsible for starting/stopping threads (you can think of them as of virtual users), like "I want to have 100 concurrent users for 10 minutes"
Throughput shaping timer is responsible for producing throughput, the load in terms of requests per second, like "I want to have 100 requests per second for 10 minutes"
So:
When you operate with "users" you cannot guarantee the number of requests per second which will be generated (see What is the Relationship Between Users and Hits Per Second? for more details if needed)
When you operate with "throughput" you cannot guarantee that the number of users will be sufficient for conducting the required load.
So you don't have to use the Throughput Shaping Timer, you can if you want to reach/maintain the load certain number of requests per second and want to make sure that the number of threads is sufficient as they can be connected via Feedback Function so JMeter will be able to kick off some new threads if the current amount is not sufficient for conducing the required load

Storm Emit Execute Latency

I have a Storm topology running in a distributed environment across 4 Unix nodes.
I have a JMSSpout that receives a message and then forwards it onto a ParseBolt that will parse the raw message and create an object.
To help measure latency my JMSSpout emits the current time as a value and then when the ParseBolt receives this it will get the current time again and take the difference as the latency.
Using this approach I am seeing 200+ ms which doesn't sound right at all. Does anyone have an idea with regards to why this might be?

It's probably a threading issue. Storm uses the same thread for all spout nextTuple() calls and tuples emitted aren't processed until the nextTuple() call ends. There's also a very tight loop that repeatedly calls the nextTuple() method and it can consume a lot of cycles if you don't put at least a short sleep in the nextTuple() implementation.
Try adding a sleep(10) and emitting only one tuple per nextTuple().

How does the Storm handle nextTuple in the Bolt

I am newbie to Storm and have created a program to read the incremented numbers for certain time. I have used a counter in Spout and in the "nextTuple()" method the counter is being emitted and incremented
_collector.emit(new Values(new Integer(currentNumber++)));
/* how this method is being continuously called...*/
and in the execute() method of the Tuple class has
public void execute(Tuple input) {
int number = input.getInteger(0);
logger.info("This number is (" + number + ")");
_outputCollector.ack(input);
}
/*this part I am clear as Bolt would receive the input from Spout*/
In my Main class execution I have the following code
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("NumberSpout", new NumberSpout());
builder.setBolt("NumberBolt", new PrimeNumberBolt())
.shuffleGrouping("NumberSpout");
Config config = new Config();
LocalCluster localCluster = new LocalCluster();
localCluster.submitTopology("NumberTest", config, builder.createTopology());
Utils.sleep(10000);
localCluster.killTopology("NumberTest");
localCluster.shutdown();
The programs Perfectly works fine. What currently I am looking here is how does the Storm framework internally calls the nextTuple() method continuously. I am sure that my understanding is missing something here and due to this gap I am unable to connect to the internal logic of this framework.
Can anyone of you guys help me in understanding this portion clearly then it would be a great help for me as I will have to implement this concept in my project. If I am conceptually clear here then I can make a significant progress. Appreciate if anyone can quickly assist me over here. Awaiting responses...

how does the Storm framework internally calls the nextTuple() method continuously.
I believe this actually involves a very detail discussion about the entire life cycle of a storm topology as well as a clear concepts of different entities like workers, executors, tasks etc. The actual processing of a topology is carried out by the StormSubmitter class with its submitTopology method.
The very first thing it does is start uploading the jar using Nimbus's Thrift interface and then calls the submitTopology which eventually submit the topology to Nimbus. The Nimbus then start by normalizing the topology (from doc: The main purpose of normalization is to ensure that every single task will have the same serialization registrations, which is critical for getting serialization working correctly) followed by serialization, zookeeper hand shaking , supervisor and worker process startup and so on. Its too broad to discuss but If you really want to dig more you can go through the life cycle of storm topology where it explain nicely the step by step actions performs during the entire time. ( quick note from the documentation)
First a couple of important notes about topologies:
The actual topology that runs is different than the topology the user
specifies. The actual topology has implicit streams and an implicit
"acker" bolt added to manage the acking framework (used to guarantee
data processing).
The implicit topology is created via the
system-topology! function. system-topology! is used in two places:
- - when Nimbus is creating tasks for the topology code - - in the worker so
it knows where it needs to route messages to code
Now here's few clue I could try to share ...
Spouts or Bolts are actually the components which does the real processing (the logic). In storm terminology they executes as many tasks across the structure.
From the doc page : Each task corresponds to one thread of execution
Now, among many others, one typical responsibility of a worker process (read here) in storm is to monitor weather a topology is active or not and stored that particular state in a variable named storm-active-atom. This variable is used by the tasks to determine whether or not to call the nextTuple method.. So as long as your topology is live (you haven't put your spout code but assuming) till the time your timer is active (as you said for certain time) it will keep calling the nextTuple method. You can dig even further to understand the storm's Acking framework implementation to understand how it understand and acknowledge once a tuple is successfully processed and Guarantee-message-processing
I am sure that my understanding is missing something here and due to this gap I am unable to connect to the internal logic of this framework
Having said this I think its more important to get a clear understanding of how to work with storm rather than how to understand storm in the early stage. e.g instead of learning the internal mechanism of storm its important to realize that if we set a spout to read a file line by line then it keep on emitting each lines using the _collector.emit method till it reaches EOF. And the bolt connected to it receive the same in its execute(tuple input) method
Hope this help you share more with us in future

Ordinary Spouts
There is a loop in the storm's executor daemon that repeatedly calls nextTuple (as well as ack and fail when appropriate) on the corresponding spout instance.
There is no waiting for tuples being processed. Spout simply receives fail for tuples that did not manage to be processed in given timeout.
This can be easily simulated with a topology of a fast spout and a slow processing bolt: the spout will receive a lot of fail calls.
See also the ISpout javadoc:
nextTuple, ack, and fail are all called in a tight loop in a single thread in the spout task. When there are no tuples to emit, it is courteous to have nextTuple sleep for a short amount of time (like a single millisecond) so as not to waste too much CPU.
Trident Spouts
The situation is completely different for Trident-spouts:
By default, Trident processes a single batch at a time, waiting for
the batch to succeed or fail before trying another batch. You can get
significantly higher throughput – and lower latency of processing of
each batch – by pipelining the batches. You configure the maximum
amount of batches to be processed simultaneously with the
topology.max.spout.pending property.
Even while processing multiple batches simultaneously, Trident will order any state updates taking place in the topology among batches.

concurrent consumers yet ensure order

I have a JMS Queue that is populated at a very high rate ( > 100,000/sec ).
It can happen that there can be multiple messages pertaining to the same entity every second as well. ( several updates to entity , with each update as a different message. )
On the other end, I have one consumer that processes this message and sends it to other applications.
Now, the whole set up is slowing down since the consumer is not able to cope up the rate of incoming messages.
Since, there is an SLA on the rate at which consumer processes messages, I have been toying with the idea of having multiple consumers acting in parallel to speed up the process.
So, what Im thinking to do is
Multiple consumers acting independently on the queue.
Each consumer is free to grab any message.
After grabbing a message, make sure its the latest version of the entity. For this, part, I can check with the application that processes this entity.
if its not latest, bump the version up and try again.
I have been looking up the Integration patterns, JMS docs so far without success.
I would welcome ideas to tackle this problem in a more elegant way along with any known APIs, patterns in Java world.

ActiveMQ solves this problem with a concept called "Message Groups". While it's not part of the JMS standard, several JMS-related products work similarly. The basic idea is that you assign each message to a "group" which indicates messages that are related and have to be processed in order. Then you set it up so that each group is delivered only to one consumer. Thus you get load balancing between groups but guarantee in-order delivery within a group.

Most EIP frameworks and ESB's have customizable resequencers. If the amount of entities is not too large you can have a queue per entity and resequence at the beginning.

For those ones interested in a way to solve this:
Use Recipient List EAI pattern
As the question is about JMS, we can take a look into an example from Apache Camel website.
This approach is different from other patterns like CBR and Selective Consumer because the consumer is not aware of what message it should process.
Let me put this on a real world example:
We have an Order Management System (OMS) which sends off Orders to be processed by the ERP. The Order then goes through 6 steps, and each of those steps publishes an event on the Order_queue, informing the new Order's status. Nothing special here.
The OMS consumes the events from that queue, but MUST process the events of each Order in the very same sequence they were published. The rate of messages published per minute is much greater than the consumer's throughput, hence the delay increases over time.
The solution requirements:
Consume in parallel, including as many consumers as needed to keep queue size in a reasonable amount.
Guarantee that events for each Order are processed in the same publish order.
The implementation:
On the OMS side
The OMS process responsible for sending Orders to the ERP, determines the consumer that will process all events of a certain Order and sends the Recipient name along with the Order.
How this process know what should be the Recipient? Well, you can use different approaches, but we used a very simple one: Round Robin.
On ERP
As it keeps the Recipient's name for each Order, it simply setup the message to be delivered to the desired Recipient.
On OMS Consumer
We've deployed 4 instances, each one using a different Recipient name and concurrently processing messages.
One could say that we created another bottleneck: the database. But it is not true, since there is no concurrency on the order line.
One drawback is that the OMS process which sends the Orders to the ERP must keep knowledge about how many Recipients are working.

Is there an API that allows ordering event in clustered application?

Given the following facts, is there a existing open-source Java API (possibly as part of some greater product) that implements an algorithm enabling the reproducible ordering of events in a cluster environment:
1) There are N sources of events, each with a unique ID.
2) Each event produced has an ID/timestamp, which, together with
its source ID, makes it uniquely identifiable.
3) The ids can be used to sort the events.
4) There are M application servers receiving those events.
M is normally 3.
5) The events can arrive at any one or more of the application
servers, in no specific order.
6) The events are processed in batches.
7) The servers have to agree for each batch on the list of events
to process.
8) The event each have earliest and latest batch ID in which they
must be processed.
9) They must not be processed earlier, and are "failed" if they
cannot be processed before the deadline.
10) The batches are based on the real clock time. For example,
one batch per second.
11) The events of a batch are processed when 2 of the 3 servers
agree on the list of events to process for that batch (quorum).
12) The "third" server then has to wait until it possesses all the
required events before it can process that batch too.
13) Once an event was processed or failed, the source has to be
informed.
14) [EDIT] Events from one source must be processed (or failed) in
the order of their ID/timestamp, but there is no causality
between different sources.
Less formally, I have those servers that receive events. They start with the same initial state, and should keep in sync by agreeing on which event to process in which order. Luckily for me, the events are not to be processed ASAP, but "in a bit", so that I have some time to get the servers to agree before the deadline. But I'm not sure if that actually make any real difference to the algorithms. And if all servers agree on all batches, then they will always be in sync, therefore presenting a consistent view when queried.
While I would be most happy with a Java API, I would accept something else if I can call it from Java. And if there is no open-source API, but a clear algorithm, I would also take that as an answer and try to implement it myself.

Looking at the question and your follow-up there probably "wasn't" an API to satisfy your requirements. To day you could take a look at the Kafka (from LinkedIn)
Apache Kafka
And the general concept of "a log" entity, in what folks like to call 'big data':
The Log: What every software engineer should know about real-time data's unifying abstraction
Actually for your question, I'd begin with the blog about "the log". In my terms the way it works -- And Kafka isn't the only package out doing log handling -- Works as follows:
Instead of a queue based message-passing / publish-subscribe
Kafka uses a "log" of messages
Subscribers (or end-points) can consume the log
The log guarantees to be "in-order"; it handles giga-data, is fast
Double check on the guarantee, there's usually a trade-off for reliability
You just read the log, I think reads are destructive as the default.
If there's a subscriber group, everyone can 'read' before the log-entry dies.
The basic handling (compute) process for the log, is a Map-Reduce-Filter model so you read-everything really fast; keep only stuff you want; process it (reduce) produce outcome(s).
The downside seems to be you need clusters and stuff to make it really shine. Since different servers or sites was mentioned I think we are still on track. I found it a finicky to get up-and-running with the Apache downloads because the tend to assume non-Windows environments (ho hum).
The other 'fast' option would be
Apache Apollo
Which would need you to do the plumbing for connecting different servers. Since the requirements include ...
servers that receive events. They start with the same initial state, and should keep in sync by agreeing on which event to process in which order. Luckily for me, the events are not to be processed ASAP, but "in a bit", so that I have some time to get the servers to agree before the deadline
I suggest looking at a "Getting Started" example or tutorial with Kafka and then looking at similar ZooKeeper organised message/log software(s). Good luck and Enjoy!

So far I haven't got a clear answer, but I think it would be useful anyone interested to see what I found.
Here are some theoretical discussions related to the subject:
Dynamic Vector Clocks for Consistent Ordering of Events
Conflict-free Replicated Data Types
One way of making multiple concurent process wait for each other, which I could use to synchronize the "batches" is a distributed barrier. One Java implementation seems to be available on top of Hazelcast and another uses ZooKeeper
One simpler alternative I found is to use a DB. Every process inserts all events it receives into the DB. Depending on the DB design, this can be fully concurrent and lock-free, like in VoltDB, for example. Then at regular interval of one second, some "cron job" runs that selects and marks the events to be processed in the next batch. The job can run on every server. The first to run the job for one batches fixes the set of events, so that the others just get to use the list that the first one defined. Like that we have a guarantee that all batches contain the same set of event on all servers. And if we can use a complete order over the whole batch, which the cron job could specify itself, then the state of the servers will be kept in sync.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.