Print from Apache Storm Bolt

Print from Apache Storm Bolt - java

I'm working my way through the example code of some Storm topologies and bolts, but I'm running into something weird. My goal is to set up Kafka with Storm, so that Storm can process the messages available on the Kafka bus. I have the following bolt defined:
public class ReportBolt extends BaseRichBolt {
private static final long serialVersionUID = 6102304822420418016L;
private Map<String, Long> counts;
private OutputCollector collector;
#Override #SuppressWarnings("rawtypes")
public void prepare(Map stormConf, TopologyContext context, OutputCollector outCollector) {
collector = outCollector;
counts = new HashMap<String, Long>();
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// terminal bolt = does not emit anything
}
#Override
public void execute(Tuple tuple) {
System.out.println("HELLO " + tuple);
}
#Override
public void cleanup() {
System.out.println("HELLO FINAL");
}
}
In essence, it should just output each Kafka message; and when the cleanup function is called, a different message should appear.
I have looked at the worker logs, and I find the final message (i.e. "HELLO FINAL"), but the Kafka messages with "HELLO" are nowhere to be found. As far as I can tell this should be a simple printer bolt, but I can't see where I'm going wrong. The workers logs indicate I am connected to the Kafka bus (it fetches the offset etc.).
In short, why are my println's not showing up in the worker logs?
EDIT
public class AckedTopology {
private static final String SPOUT_ID = "monitoring_test_spout";
private static final String REPORT_BOLT_ID = "acking-report-bolt";
private static final String TOPOLOGY_NAME = "monitoring-topology";
public static void main(String[] args) throws Exception {
int numSpoutExecutors = 1;
KafkaSpout kspout = buildKafkaSpout();
ReportBolt reportBolt = new ReportBolt();
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(SPOUT_ID, kspout, numSpoutExecutors);
builder.setBolt(REPORT_BOLT_ID, reportBolt);
Config cfg = new Config();
StormSubmitter.submitTopology(TOPOLOGY_NAME, cfg, builder.createTopology());
}
private static KafkaSpout buildKafkaSpout() {
String zkHostPort = "URL";
String topic = "TOPIC";
String zkRoot = "/brokers";
String zkSpoutId = "monitoring_test_spout_id";
ZkHosts zkHosts = new ZkHosts(zkHostPort);
SpoutConfig spoutCfg = new SpoutConfig(zkHosts, topic, zkRoot, zkSpoutId);
KafkaSpout kafkaSpout = new KafkaSpout(spoutCfg);
return kafkaSpout;
}
}

Your bolt is not chained with the spout. You need to use storm's grouping in order to do that .. Use something like this
builder.setBolt(REPORT_BOLT_ID, reportBolt).shuffleGrouping(SPOUT_ID);
The setBolt typically returns a InputDeclarer object. In your case by specifying shuffleGrouping(SPOUT_ID) you are telling storm that you are interested in consuming all the tuples emitted by component having id REPORT_BOLT_ID.
Read more on stream groupings and choose the one based on your need.

Related

Configuration to handle Dead-letter queue

I have a project that uses Spring Cloud Streams - RabbitMQ to exchange messages within micro-services. One thing that is critical for my project is that I must not lose any message.
In order to minimize failures, I planned the following:
Use the default retry method for messages in queue
Configure dead-letter queue to put messages again on queue after some time
To avoid an infinite loop, allow only a few times (let's say, 5) a message could be republished from dead-letter queue to regular messaging queue.
The first two items I believe I could make it using the configuration below:
#dlx/dlq setup - retry dead letter 5 minutes later (300000ms later)
spring.cloud.stream.rabbit.bindings.input.consumer.auto-bind-dlq=true
spring.cloud.stream.rabbit.bindings.input.consumer.republish-to-dlq=true
spring.cloud.stream.rabbit.bindings.input.consumer.dlq-ttl=300000
spring.cloud.stream.rabbit.bindings.input.consumer.dlq-dead-letter-exchange=
#input
spring.cloud.stream.bindings.myInput.destination=my-queue
spring.cloud.stream.bindings.myInput.group=my-group
However, I could not find searching on this reference guide how to do what I want (mostly, how to configure a maximum number of republish from dead-letter queue). I'm not completely sure I'm on the right path - maybe I should manually create a second queue and code what I want, and leave dead-letter only to messages that completely failed (which I must check regularly and handle manually, since my system should not lose any messages)...
I'm new to these frameworks, and I would like your help to configure mine...

This documentation for the rabbit binder shows how to publish a dead-letter to some parking-lot queue after some number of retries have failed.
#SpringBootApplication
public class ReRouteDlqApplication {
private static final String ORIGINAL_QUEUE = "so8400in.so8400";
private static final String DLQ = ORIGINAL_QUEUE + ".dlq";
private static final String PARKING_LOT = ORIGINAL_QUEUE + ".parkingLot";
private static final String X_RETRIES_HEADER = "x-retries";
public static void main(String[] args) throws Exception {
ConfigurableApplicationContext context = SpringApplication.run(ReRouteDlqApplication.class, args);
System.out.println("Hit enter to terminate");
System.in.read();
context.close();
}
#Autowired
private RabbitTemplate rabbitTemplate;
#RabbitListener(queues = DLQ)
public void rePublish(Message failedMessage) {
Integer retriesHeader = (Integer) failedMessage.getMessageProperties().getHeaders().get(X_RETRIES_HEADER);
if (retriesHeader == null) {
retriesHeader = Integer.valueOf(0);
}
if (retriesHeader < 3) {
failedMessage.getMessageProperties().getHeaders().put(X_RETRIES_HEADER, retriesHeader + 1);
this.rabbitTemplate.send(ORIGINAL_QUEUE, failedMessage);
}
else {
this.rabbitTemplate.send(PARKING_LOT, failedMessage);
}
}
#Bean
public Queue parkingLot() {
return new Queue(PARKING_LOT);
}
}
The second example shows how to use the delayed exchange plugin to delay between retries.
#SpringBootApplication
public class ReRouteDlqApplication {
private static final String ORIGINAL_QUEUE = "so8400in.so8400";
private static final String DLQ = ORIGINAL_QUEUE + ".dlq";
private static final String PARKING_LOT = ORIGINAL_QUEUE + ".parkingLot";
private static final String X_RETRIES_HEADER = "x-retries";
private static final String DELAY_EXCHANGE = "dlqReRouter";
public static void main(String[] args) throws Exception {
ConfigurableApplicationContext context = SpringApplication.run(ReRouteDlqApplication.class, args);
System.out.println("Hit enter to terminate");
System.in.read();
context.close();
}
#Autowired
private RabbitTemplate rabbitTemplate;
#RabbitListener(queues = DLQ)
public void rePublish(Message failedMessage) {
Map<String, Object> headers = failedMessage.getMessageProperties().getHeaders();
Integer retriesHeader = (Integer) headers.get(X_RETRIES_HEADER);
if (retriesHeader == null) {
retriesHeader = Integer.valueOf(0);
}
if (retriesHeader < 3) {
headers.put(X_RETRIES_HEADER, retriesHeader + 1);
headers.put("x-delay", 5000 * retriesHeader);
this.rabbitTemplate.send(DELAY_EXCHANGE, ORIGINAL_QUEUE, failedMessage);
}
else {
this.rabbitTemplate.send(PARKING_LOT, failedMessage);
}
}
#Bean
public DirectExchange delayExchange() {
DirectExchange exchange = new DirectExchange(DELAY_EXCHANGE);
exchange.setDelayed(true);
return exchange;
}
#Bean
public Binding bindOriginalToDelay() {
return BindingBuilder.bind(new Queue(ORIGINAL_QUEUE)).to(delayExchange()).with(ORIGINAL_QUEUE);
}
#Bean
public Queue parkingLot() {
return new Queue(PARKING_LOT);
}
}

Task not Serializable - Spark Java

I'm getting the Task not serializable error in Spark. I've searched and tried to use a static function as suggested in some posts, but it still gives the same error.
Code is as below:
public class Rating implements Serializable {
private SparkSession spark;
private SparkConf sparkConf;
private JavaSparkContext jsc;
private static Function<String, Rating> mapFunc;
public Rating() {
mapFunc = new Function<String, Rating>() {
public Rating call(String str) {
return Rating.parseRating(str);
}
};
}
public void runProcedure() {
sparkConf = new SparkConf().setAppName("Filter Example").setMaster("local");
jsc = new JavaSparkContext(sparkConf);
SparkSession spark = SparkSession.builder().master("local").appName("Word Count")
.config("spark.some.config.option", "some-value").getOrCreate();
JavaRDD<Rating> ratingsRDD = spark.read().textFile("sample_movielens_ratings.txt")
.javaRDD()
.map(mapFunc);
}
public static void main(String[] args) {
Rating newRating = new Rating();
newRating.runProcedure();
}
}
The error gives:
How do I solve this error?
Thanks in advance.

Clearly Rating cannot be Serializable, because it contains references to Spark structures (i.e. SparkSession, SparkConf, etc.) as attributes.
The problem here is in
JavaRDD<Rating> ratingsRD = spark.read().textFile("sample_movielens_ratings.txt")
.javaRDD()
.map(mapFunc);
If you look at the definition of mapFunc, you're returning a Rating object.
mapFunc = new Function<String, Rating>() {
public Rating call(String str) {
return Rating.parseRating(str);
}
};
This function is used inside a map (a transformation in Spark terms). Because the transformations are executed directly into the worker nodes and not in the driver node, their code must be serializable. This forces Spark to try serialize the Rating class, but it is not possible.
Try to extract the features you need from Rating, and placing them in a different class that does not own any Spark structure. Finally, use this new class as return type of your mapFunc function.

In addition you have to be sure to not include non-serializable variables in your class like JavaSparkContext and SparkSession. if you need to include them you should define like this:
private transient JavaSparkContext sparkCtx;
private transient SparkSession spark;
Good luck.

Why does static keyword "fix" the issue of Task not serializable?

I have experienced "task not serializable" issue when running spark streaming. The reason can be found in this thread.
After I have tried couple methods and fixed the issue, I don't understand why it works.
public class StreamingNotWorking implements Serializable {
private SparkConf sparkConf;
private JavaStreamingContext jssc;
public StreamingNotWorking(parameter) {
sparkConf = new SparkConf();
this.jssc = createContext(parameter);
JavaDStream<String> messages = functionCreateStream(parameter);
messages.print();
}
public void run() {
this.jssc.start();
this.jssc.awaitTermination();
}
public class streamingNotWorkingDriver {
public static void main(String[] args) {
Streaming bieventsStreaming = new StreamingNotWorking(parameter);
bieventsStreaming.run();
}
Will give the same "Task not serializable" error.
However, if I modify the code to:
public class StreamingWorking implements Serializable {
private static SparkConf sparkConf;
private static JavaStreamingContext jssc;
public void createStream(parameter) {
sparkConf = new SparkConf();
this.jssc = createContext(parameter);
JavaDStream<String> messages = functionCreateStream(parameter);
messages.print();
run();
}
public void run() {
this.jssc.start();
this.jssc.awaitTermination();
}
public class streamingWorkingDriver {
public static void main(String[] args) {
Streaming bieventsStreaming = new StreamingWorking();
bieventsStreaming.createStream(parameter);
}
works perfectly fine.
I know one of the reasons is that sparkConf and jssc need to be static. But I don't understand why.
Could anyone explain the difference?

neither JavaStreamingContext nor SparkConf implements Serializable.
You can't serialize instances of classes without this interface.
Static members wont be serialized.
More information can be found here:
http://docs.oracle.com/javase/7/docs/api/java/io/Serializable.html

Akka: Testing Supervisor Recommendations

I am very new to Akka and using Java to program my system.
Problem definition
- I have a TenantMonitor which when receives TenantMonitorMessage(), starts a new actor DiskMonitorActor.
- The DiskMonitorActor may fail for various reasons and may throw DiskException. The DiskMonitorActor has been Unit Tested.
What I need?
- I want to test behavior TenantMonitorActor, so that when DiskException happens, it takes correct action like stop(), resume() or any (depending upon what my application may need)
What I tried?
Based on the documentation, the closest I could perform is the section called Expecting Log Messages.
Where I need help?
- While I understand the expecting the correct error log is important, it just asserts first part, that exception is thrown and is logged correctly, but does not help in asserting that right strategy is called
Code?
TenantMonitorActor
public class TenantMonitorActor extends UntypedActor {
public static final String DISK_MONITOR = "diskMonitor";
private static final String assetsLocationKey = "tenant.assetsLocation";
private static final String schedulerKey = "monitoring.tenant.disk.schedule.seconds";
private static final String thresholdPercentKey = "monitoring.tenant.disk.threshold.percent";
private final LoggingAdapter logging = Logging.getLogger(getContext().system(), this);
private final Config config;
private TenantMonitorActor(final Config config) {
this.config = config;
}
private static final SupervisorStrategy strategy =
new OneForOneStrategy(1, Duration.create(1, TimeUnit.SECONDS),
new Function<Throwable, Directive>() {
public Directive apply(final Throwable param) throws Exception {
if (param instanceof DiskException) {
return stop();
}
return restart();
}
});
public static Props props(final Config config) {
return Props.create(new Creator<TenantMonitorActor>(){
public TenantMonitorActor create() throws Exception {
return new TenantMonitorActor(config);
}
});
}
#Override
public void onReceive(final Object message) throws Exception {
if (message instanceof TenantMonitorMessage) {
logging.info("Tenant Monitor Setup");
setupDiskMonitoring();
}
}
#Override
public SupervisorStrategy supervisorStrategy() {
return strategy;
}
private void setupDiskMonitoring() {
final ActorRef diskMonitorActorRef = getDiskMonitorActorRef(config);
final FiniteDuration start = Duration.create(0, TimeUnit.SECONDS);
final FiniteDuration recurring = Duration.create(config.getInt(schedulerKey),
TimeUnit.SECONDS);
final ActorSystem system = getContext().system();
system.scheduler()
.schedule(start, recurring, diskMonitorActorRef,
new DiskMonitorMessage(), system.dispatcher(), null);
}
private ActorRef getDiskMonitorActorRef(final Config monitoringConf) {
final Props diskMonitorProps =
DiskMonitorActor.props(new File(monitoringConf.getString(assetsLocationKey)),
monitoringConf.getLong(thresholdPercentKey));
return getContext().actorOf(diskMonitorProps, DISK_MONITOR);
}
}
Test
#Test
public void testActorForNonExistentLocation() throws Exception {
final Map<String, String> configValues =
Collections.singletonMap("tenant.assetsLocation", "/non/existentLocation");
final Config config = mergeConfig(configValues);
new JavaTestKit(system) {{
assertEquals("system", system.name());
final Props props = TenantMonitorActor.props(config);
final ActorRef supervisor = system.actorOf(props, "supervisor");
new EventFilter<Void>(DiskException.class) {
#Override
protected Void run() {
supervisor.tell(new TenantMonitorMessage(), ActorRef.noSender());
return null;
}
}.from("akka://system/user/supervisor/diskMonitor").occurrences(1).exec();
}};
}
UPDATE
The best I could write is to make sure that the DiskMonitor is stopped once the exception occurs
#Test
public void testSupervisorForFailure() {
new JavaTestKit(system) {{
final Map<String, String> configValues =
Collections.singletonMap("tenant.assetsLocation", "/non/existentLocation");
final Config config = mergeConfig(configValues);
final TestActorRef<TenantMonitorActor> tenantTestActorRef = getTenantMonitorActor(config);
final ActorRef diskMonitorRef = tenantTestActorRef.underlyingActor().getContext()
.getChild(TenantMonitorActor.DISK_MONITOR);
final TestProbe testProbeDiskMonitor = new TestProbe(system);
testProbeDiskMonitor.watch(diskMonitorRef);
tenantTestActorRef.tell(new TenantMonitorMessage(), getRef());
testProbeDiskMonitor.expectMsgClass(Terminated.class);
}};
}
Are there better ways?

I have the feeling that testing supervisor strategy is some sort of grey area -- it is up to personal opinion where we start testing Akka itself, instead of one's understanding of how the framework works. Testing validation of entities in ORM frameworks strikes me as a similar problem. We don't want to test whether email validation logic is correct (e.g. in Hibernate), but rather if our rule is correctly declared.
Following this logic, I would write the test as follows:
final TestActorRef<TenantMonitorActor> tenantTestActorRef =
getTenantMonitorActor(config);
SupervisorStrategy.Directive directive = tenantTestActorRef.underlyingActor()
.supervisorStrategy().decider().apply(new DiskException());
assertEquals(SupervisorStrategy.stop(), directive);

Thread issue while using Executor service

I have facing thread issue in the below code.When then thread executes the Run method of the runnable object,it doesnt print the data that I expect it to be.
code 1--calling code
Map<String,Object> logData = CPEMethodData.getLogDataMap();
CatalogUpdaterLogger.getLogger().info("6 before new splunk logger log data =" + logData);
CatalogrLogger writer = new CatalogLogger(LogType.INFO,logData,LoggerType.CATALOGUPDATER);
LogPool.INSTANCE.submitTask(writer);//submitting writer which is a runnable object to the queue
//add one more task/writer to the queue in the same method
logData = CPEMethodData.getLogDataMap();
CatalogUpdaterLogger.getLogger().info("11 before 3rd writer=logData "+logData);
CatalogLogger writer2 = new CatalogLogger(LogType.INFO,logData,LoggerType.CATALOGUPDATER);
LogPool.INSTANCE.submitTask(writer2);
In the above code,I have checked that logData Returned by the CPEMethodData.getLogDataMap()is different which I expected.But still when the runnable object actually executes,it runs with same data...
code 2--creating thread pool with 5 threads...
public enum LogPool {
INSTANCE;
private static final int nThreads = 5;
final ExecutorService executor = Executors.newFixedThreadPool(nThreads);
public synchronized void submitTask(Runnable task) {
executor.execute(task);
}
Code 3--runnable code
public class CatalogLogger implements Runnable {
protected LogType logType;
protected LoggerType loggerType;
protected Map<String, Object> logData;
public CatalogLogger(LogType logType, Map<String, Object> logData,
LoggerType loggerType) {
this.logType = logType;
this.logData = logData;
this.loggerType = loggerType;
}
public void run() {
System.out.println("running with logData " + logData);
System.out.println(" Thread.currentThread().hashCode() " +Thread.currentThread().hashCode());
switch (loggerType) {
case ORDERPROCESSING:
logData(Logger.getLogger(ORDER_LOG));
break;
case CATALOGUPDATER:
logData(Logger.getLogger(CATALOGUPDATER_LOG));
break;
}
}
Below is the CPEmethoddata.getLogData
public class CPEMethodData {
private static ThreadLocal<Map<String, Object>> logDataMap = new ThreadLocal<Map<String, Object>>();
public static Map<String,Object> getLogDataMap() {
return logDataMap.get();
}
public static void setOppParameters(Map<String, Object> inputParams) {
Map<String, Object> oppStatus = logDataMap.get();
if (oppStatus == null) {
oppStatus = new HashMap<String, Object>();
logDataMap.set(oppStatus);
}
oppStatus.put(INPUT_PARAMS, inputParams);
}
#SuppressWarnings("unchecked")
public static Map<String, Object> getOperationParameters() {
Map<String, Object> oppStatus = logDataMap.get();
if (oppStatus != null)
return (Map<String, Object>) oppStatus.get(INPUT_PARAMS);
return null;
}
}
when I run the code 1 which submits two runnable to the queue,I expect to see different logData content in the sysout of the run method but as i have debugged it I see that data is same in both the executions...seems that 2nd runnable is interfering with the first one....Can anyone please help me to understand what is the problem here.I thought I am passing 2 different instances of CatalogLogger and shouldnt cause any problem..Also can anyone please suggest any solution for this ?

As written by the #ReneLink in the comment to my question ,CPEMethodData.getLogDataMap was returning same instance of the hashmap...So by the time thread's run method was getting executed hashmap's content were getting modified.I created deep copy of the hashmap using Cloner facility and passed the same to the thread.
Thanks #ReneLink for pointing out this to me.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Print from Apache Storm Bolt - java

Related

Configuration to handle Dead-letter queue

Task not Serializable - Spark Java

Why does static keyword "fix" the issue of Task not serializable?

Akka: Testing Supervisor Recommendations

Thread issue while using Executor service

Categories

Resources