Building a storm topology with shell bolts - java

I'm currently trying to implement a Storm topology that integrates with the R language.
As a starting point, i took the following project (https://github.com/allenday/R-Storm) which works by extending the ShellBolt class to implement R integration, as well as an R library to handle communication between the java and R sides.
My problem is that if i create a topology based on regular (java-only) bolts, i can chain them together without issue. However, when one of the bolts in the middle of the chain is an R Shell Bolt, the thing just falls apart with:
5661 [Thread-18] ERROR backtype.storm.util - Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
Shell Process Exception:
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.daemon.executor$fn__3557$fn__3569$fn__3616.invoke(executor.clj:715) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.util$async_loop$fn__436.invoke(util.clj:377) ~[storm-0.9.0-wip16.jar:na]
at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_25]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
More concrete, the following topology works as expected:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 1);
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
Where PermuteBolt is an R Shell Bolt. The logs for this example show the expected output:
6246 [Thread-18] INFO backtype.storm.daemon.task - Emitting: spout default [four score and seven years ago]
6246 [Thread-16] INFO backtype.storm.daemon.executor - Processing received message source: spout:3, stream: default, id: {}, [four score and seven years ago]
6261 [Thread-23] INFO backtype.storm.daemon.task - Emitting: permutebolt default ["PERMUTE seven years ago and four score"]
If, however i add another bolt that gets its data from the first one, such as:
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
builder.setBolt("identity", new IdentityBolt(new Fields("identity")), 1).fieldsGrouping("permutebolt", new Fields("permutation"));
It fails with the trace printed above. Also, what's weird is that this second example which is failing is included with the project.
Is this an issue anyone has faced before ?
UPDATE: I noticed this only occurs when using R Shell Bolts, i have since tried launching bolts that use python scripts and have been able to chain them normally.

#andrei, this is fixed in 1.01 uploaded to github today:
https://github.com/allenday/R-Storm/releases/tag/v1.01
It has been submitted to CRAN and will be available soon.
Thanks for reporting.
-Allen

Related

Spark Error: I/O error constructing remote block reader. java.nio.channels.ClosedByInterruptException at java.nio.channels.ClosedByInterruptException

The execution was ok locally in unit test, but fails when the Spark Streaming execution is propagated to the real cluster executors, like they silently crash and no longer available for the context:
stream execution thread for kafkaDataGeneratorInactiveESP_02/Distance [id = 438f45a0-acd6-4729-953f-5a18ae208f1f, runId = a98c6d39-fe14-4ed5-b7fe-7e4009de51b2]] impl.BlockReaderFactory (BlockReaderFactory.java:getRemoteBlockReaderFromTcp(765)) - I/O error constructing remote block reader.
java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:656)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:2940)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:822)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:747)
at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:380)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:644)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:575)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
at java.io.DataInputStream.read(DataInputStream.java:149)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at scala.io.BufferedSource$BufferedLineIterator.hasNext(BufferedSource.scala:74)
at org.apache.spark.sql.execution.streaming.CommitLog.deserialize(CommitLog.scala:56)
at org.apache.spark.sql.execution.streaming.CommitLog.deserialize(CommitLog.scala:48)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:153)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.$anonfun$getLatest$2(HDFSMetadataLog.scala:190)
at scala.runtime.java8.JFunction1$mcVJ$sp.apply(JFunction1$mcVJ$sp.java:23)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:258)
at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.getLatest(HDFSMetadataLog.scala:189)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.populateStartOffsets(MicroBatchExecution.scala:300)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$2(MicroBatchExecution.scala:194)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken(ProgressReporter.scala:352)
at org.apache.spark.sql.execution.streaming.ProgressReporter.reportTimeTaken$(ProgressReporter.scala:350)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:69)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.$anonfun$runActivatedStream$1(MicroBatchExecution.scala:191)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:57)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:185)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:334)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:245)
First thing I have tried was changing the query name, having slash and space: kafkaDataGeneratorInactiveESP_02/Distance
After repacing it to the correct one in
.queryName("kafkaDataGeneratorInactive" + currentIter.metadata.getString("label"))
100% proven from having / or space in the string, the error hasn't gone.
The reason of the failure was actually in using the same name for the query name and checkpoint location path (but not in the part that was attempteed to be improved for the first time). Later I have fount one more error log:
2021-12-01 15:05:46,906 WARN [main] streaming.StreamingQueryManager (Logging.scala:logWarning(69)) - Stopping existing streaming query [id=b13a69d7-5a2f-461e-91a7-a9138c4aa716, runId=9cb31852-d276-42d8-ade6-9839fa97f85c], as a new run is being started.
Why the query was stopped? That's because in Scala I was creating streaming queries in a loop, iterating the collection, while keeping all the query names and all the checkpoint names the same. After making them unique (i just used the string values from the collection), the failure problem has gone.

Deque Full Exception - Thingsboard

Im having trouble using Thingsboard platform (IoT) when simulating 7.5K devices sending data to the platform. I have the following error in the logs as soon as I start sending data (over MQTT):
2020-08-01 01:17:06,946 [ForkJoinPool-12-worker-0] ERROR c.g.c.u.concurrent.AggregateFuture - Got more than one input Future failure. Logging failures after the first
java.lang.IllegalStateException: Deque full
at java.util.concurrent.LinkedBlockingDeque.addLast(LinkedBlockingDeque.java:335)
at java.util.concurrent.LinkedBlockingDeque.add(LinkedBlockingDeque.java:633)
at org.thingsboard.server.dao.util.AbstractBufferedRateExecutor.submit(AbstractBufferedRateExecutor.java:109)
at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsync(CassandraAbstractDao.java:93)
at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsyncWrite(CassandraAbstractDao.java:76)
at org.thingsboard.server.dao.timeseries.CassandraBaseTimeseriesDao.savePartition(CassandraBaseTimeseriesDao.java:434)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.saveAndRegisterFutures(BaseTimeseriesService.java:153)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.save(BaseTimeseriesService.java:144)
at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.saveAndNotify(DefaultTelemetrySubscriptionService.java:124)
at org.thingsboard.rule.engine.telemetry.TbMsgTimeseriesNode.onMsg(TbMsgTimeseriesNode.java:89)
at org.thingsboard.server.actors.ruleChain.RuleNodeActorMessageProcessor.onRuleChainToRuleNodeMsg(RuleNodeActorMessageProcessor.java:107)
at org.thingsboard.server.actors.ruleChain.RuleNodeActor.onRuleChainToRuleNodeMsg(RuleNodeActor.java:97)
at org.thingsboard.server.actors.ruleChain.RuleNodeActor.doProcess(RuleNodeActor.java:60)
at org.thingsboard.server.actors.service.ContextAwareActor.process(ContextAwareActor.java:45)
at org.thingsboard.server.actors.TbActorMailbox.processMailbox(TbActorMailbox.java:121)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
I have try to google and see the reason behind it, but i havent found anything.
While simulating with 5K devices, this error came like 3 times each day (over a 4 day period), but it stopped showing that error eventually. However, when increasing the number of devices, the error is constant. Im using Kafka as the broker, but I dont see any Kafka related error. I just want to know why the error appears, is it related to memory, or any other limit?
Thanks in advance
Francisco P

Failed to delete the state directory in IDE for Kafka Stream Application

I am developing a simple Kafka Stream application which extracting messages from a topic and put it into another topic after transformation. I am using Intelij for my development.
When I debug/run this application, it works perfect if my IDE and the Kafka Server sitting in the SAME machine
(i.e. with the BOOTSTRAP_SERVERS_CONFIG = localhost:9092 and
SCHEMA_REGISTRY_URL_CONFIG = localhost:8081)
However, when I try to use another machine to do the development
(i.e. with the BOOTSTRAP_SERVERS_CONFIG = XXX.XXX.XXX:9092 and
SCHEMA_REGISTRY_URL_CONFIG = XXX.XXX.XXX:8081 where XXX.XXX.XXX is the
ip address of my Kafka),
the debug process run without problem at the 1st time. However, when I run 2nd time after resetting the offset, I received the following error:
ERROR stream-thread [main] Failed to delete the state directory. (org.apache.kafka.streams.processor.internals.StateDirectory:297)
java.nio.file.DirectoryNotEmptyException: \tmp\kafka-streams\my_application_id\0_0
Exception in thread "main" org.apache.kafka.streams.errors.StreamsException: java.nio.file.DirectoryNotEmptyException:
If I changed my_application_id as my_application_id2, and run it, it works again at the 1st time but receiving error again if I run it again.
I have the following code in my last sentence in my application:
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
Any advice how to solve this problem?
UPDATE:
I have reviewed the state directory created in my development machine (Windows Platform) and if I delete these directory manually before running 2nd time, no error found. I have tried to run my IDE as Administrator because I think this could be something about the permission on the folder. However, this doesn't help.
Full stack for reference:
INFO Kafka version : 1.1.0 (org.apache.kafka.common.utils.AppInfoParser:109)
INFO Kafka commitId : fdcf75ea326b8e07 (org.apache.kafka.common.utils.AppInfoParser:110)
INFO stream-thread [main] Deleting state directory 0_0 for task 0_0 as user calling cleanup. (org.apache.kafka.streams.processor.internals.StateDirectory:281)
Disconnected from the target VM, address: '127.0.0.1:16552', transport: 'socket'
Exception in thread "main" org.apache.kafka.streams.errors.StreamsException: java.nio.file.DirectoryNotEmptyException: C:\workspace\bennychan\kafka-streams\my_application_001\0_0
at org.apache.kafka.streams.processor.internals.StateDirectory.clean(StateDirectory.java:231)
at org.apache.kafka.streams.KafkaStreams.cleanUp(KafkaStreams.java:931)
at com.macroviewhk.financialreport.simpleStream.start(simpleStream.java:60)
at com.macroviewhk.financialreport.simpleStream.main(simpleStream.java:45)
Caused by: java.nio.file.DirectoryNotEmptyException: C:\workspace\bennychan\kafka-streams\my_application_001\0_0
at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:266)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:651)
at org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:634)
at java.nio.file.Files.walkFileTree(Files.java:2688)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:634)
ERROR stream-thread [main] Failed to delete the state directory. (org.apache.kafka.streams.processor.internals.StateDirectory:297)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:287)
java.nio.file.DirectoryNotEmptyException: C:\workspace\bennychan\kafka-streams\my_application_001\0_0
at org.apache.kafka.streams.processor.internals.StateDirectory.clean(StateDirectory.java:228)
at sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:266)
... 3 more
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:651)
at org.apache.kafka.common.utils.Utils$1.postVisitDirectory(Utils.java:634)
at java.nio.file.Files.walkFileTree(Files.java:2688)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:634)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:287)
at org.apache.kafka.streams.processor.internals.StateDirectory.clean(StateDirectory.java:228)
at org.apache.kafka.streams.KafkaStreams.cleanUp(KafkaStreams.java:931)
at com.macroviewhk.financialreport.simpleStream.start(simpleStream.java:60)
at com.macroviewhk.financialreport.simpleStream.main(simpleStream.java:45)
UPDATE 2 :
After another detailed check, the line below throwing IOException
Files.walkFileTree(file.toPath(), new SimpleFileVisitor<Path>() {
This line is located at kafka-clients-1.1.0.jar org.apache.kafka.common.utilsUtils.class
May be this is the problem with Windows system (sorry that I am not an experienced JAVA programmer).
For googlers..
I'm currently using this Scala code for helping windows guys to handle deletion of state store.
if (System.getProperty("os.name").toLowerCase.contains("windows")) {
logger.info("WINDOWS OS MODE - Cleanup state store.")
try {
FileUtils.deleteDirectory(new File("/tmp/kafka-streams/" + config.getProperty("application.id")))
FileUtils.forceMkdir(new File("/tmp/kafka-streams/" + config.getProperty("application.id")))
} catch {
case e: Exception => logger.error(e.toString)
}
}
else {
streams.cleanUp()
}
I agree with #ideano1 that is seems to be related to https://issues.apache.org/jira/browse/KAFKA-6647 -- what you can try is, to explicitly call KafkaStreams#cleanUp() between tests. It's unclear why there are issues at Window-OS. Atm, all testing happens on Linux.
This is what we've implemented that works on Windows. This is written in Kotlin.
Version used : kafka-streams-test-utils:2.3.0.
The key is to catch the exception. The tests will pass as long as you catch the exception raised by testDriver.close()even if you don't delete the directory. However, cleaning up the directory makes your unit tests independent and repeatable.
val directory = "test"
#BeforeEach
fun setup(){
//other code omitted for setting the props
props.setProperty(StreamsConfig.STATE_DIR_CONFIG,directory)
}
#AfterEach
fun tearDown(){
try{
testDriver.close()
}catch(exception: Exception){
FileUtils.deleteDirectory(File(directory)) //there is a bug on Windows that does not delete the state directory properly. In order for the test to pass, the directory must be deleted manually
}
}
For tests (but not only if you afford so) one could use an IN_MEMORY("in-memory") store for each KTable created (directly or indirectly, by e.g. aggregations); this avoids the creation of any directory such that the error no longer occurs.

File lock exception when using graphHopper in java program

I'm using GraphHopper in the following way:
GraphHopper hopper = new GraphHopper().forServer();
hopper.setCHEnable(false);
hopper.setGraphHopperLocation(GraphHoperMasterFile);
hopper.setOSMFile(OSMFile);
hopper.setEncodingManager(new EncodingManager("car,bike"));
hopper.importOrLoad();
GHRequest req = new GHRequest().addPoint(new GHPoint (latFrom, lonFrom)).addPoint(new GHPoint(latTo, lonTo))
.setVehicle("car")
.setWeighting("fastest")
.setAlgorithm(AlgorithmOptions.ASTAR_BI);;
req.getHints().put("pass_through", true);
GHResponse res = hopper.route(req);
I obtained the GraphHoperMasterFile by downloading the zip from https://github.com/graphhopper/graphhopper/blob/0.5/docs/core/routing.md.
I obtained the .osm file from http://download.geofabrik.de/europe/great-britain/england/greater-london.html.
I also added the maven dependancy from http://mvnrepository.com/artifact/com.graphhopper/graphhopper-web/0.5.0. I get the sense that it's wrong to have the maven dependancy and reference the graphHopperLocation, but i'm not sure.
When I run this code sometime (not all the time) get the following errors:
java.util.concurrent.ExecutionException: java.lang.RuntimeException: To avoid reading partial data we need to obtain the read lock but it failed.
Caused by: java.lang.RuntimeException: To avoid reading partial data we need to obtain the read lock but it failed.
Caused by: java.nio.channels.OverlappingFileLockException
When it works I get the following:
2016-01-28 08:48:14,551 [pool-1-thread-8] INFO com.graphhopper.GraphHopper - version 0.5.0|2015-08-12T12:33:51+0000 (4,12,3,2,2,1)
2016-01-28 08:48:14,551 [pool-1-thread-8] INFO com.graphhopper.GraphHopper - graph car,bike|RAM_STORE|2D|NoExt|4,12,3,2,2, details:edges:387 339(12MB), nodes:291 068(4MB), name:(2MB), geo:960 828(4MB), bounds:-0.5177850019436703,0.33744369456418666,51.28324388600686,51.69833101402963
I see the thrown error over here https://github.com/graphhopper/graphhopper/blob/master/core/src/main/java/com/graphhopper/GraphHopper.java
How can I stop this error from happening?

Runtime partition failed for this job in Hama BSP

I encountered the following problem when start running a hama BSP job. This exception occurs when hama tries to load and partition the input data before it actually runs my own code. This is a known problem discussed in some websites but unfortunate without a known cause (eg. see here).
My BSP job works perfectly ok when I only runs part of the data set. However, when I run the full data set, the problem occurs :(
Can I know how to resolve or avoid this problem?
13/11/18 01:19:30 INFO bsp.FileInputFormat: Total input paths to process : 32
13/11/18 01:19:30 INFO bsp.FileInputFormat: Total input paths to process : 32
13/11/18 01:19:30 INFO bsp.BSPJobClient: Running job: job_201311180115_0002
13/11/18 01:19:33 INFO bsp.BSPJobClient: Current supersteps number: 0
13/11/18 01:19:33 INFO bsp.BSPJobClient: Job failed.
13/11/18 01:19:33 ERROR bsp.BSPJobClient: Error partitioning the input path.
java.io.IOException: Runtime partition failed for the job.
at org.apache.hama.bsp.BSPJobClient.partition(BSPJobClient.java:465)
at org.apache.hama.bsp.BSPJobClient.submitJobInternal(BSPJobClient.java:333)
at org.apache.hama.bsp.BSPJobClient.submitJob(BSPJobClient.java:293)
at org.apache.hama.bsp.BSPJob.submit(BSPJob.java:228)
at org.apache.hama.bsp.BSPJob.waitForCompletion(BSPJob.java:235)
at edu.wisc.cs.db.opener.hama.ConnectedEntityBspDriver.main(ConnectedEntityBspDriver.java:183)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hama.util.RunJar.main(RunJar.java:146)
After stuck at this problem for several hours, I found that once the number of input files is greater than the number of allowed bsp tasks, then this error will occur. I think it is probably a bug that Hama should fix in the future.
A quick fix to this problem is to increase the number of maximum bsp tasks, specified by the variable bsp.tasks.maximum in the hama-site.xml file. For example, the following uses 10 instead of the default setting 3:
<property>
<name>bsp.tasks.maximum</name>
<value>10</value>
<description>The maximum number of BSP tasks that will be run simultaneously
by a groom server.</description>
</property>

Categories

Resources