I'm trying to create a Spring Boot application which uses the GCP Pub/Sub binder with Spring Cloud Streams to listen for an event GCP posts to a topic when a file is loaded into a specific bucket. The spring boot application would then need to take the file referenced in the pub/sub message, download it to the temp directory, process it, then delete it from local storage. It successfully processes 371 files all dumped into the directory, but there were 137 files that all threw the same error. The following is the method throwing the exception, and the exception itself
private Optional<List<SingleImport>> load(StorageObjectMessage storageObjectMessage, Storage storage) throws JAXBException {
ImportLoader loader = new ImportLoader();
FileConverter converter = new FileConverter();
Blob blob = storage.get(blobId);
Optional<List<SingleImport>> result = Optional.empty();
log.info(String.format("%s blob with id of %s", blob == null ? "Could not find" : "Found", blobId.getName()));
if(blob != null) {
String systemTemporaryDirectory = String.format("%s%s.tmp", System.getProperty("java.io.tmpdir"), storageObjectMessage.getMd5Hash());
log.info(String.format("Downloading file for processing to temp directory at %s", systemTemporaryDirectory));
blob.downloadTo(Paths.get(systemTemporaryDirectory));
File tempFile = new File(systemTemporaryDirectory);
Import importModel = loader.load(tempFile);
List<SingleImport> singleImport = converter.convert(importModel);
result = Optional.of(singleImport);
}
return result;
}
2022-03-17 10:43:43.380 INFO 39483 --- [sub-subscriber1] c.h.mro2go.sap.service.FileService : Found blob with id of update/inbound/cat_upd_20220104124110-00028.xml
2022-03-17 10:43:43.380 INFO 39483 --- [sub-subscriber1] c.h.mro2go.sap.service.FileService : Downloading file for processing to temp directory at /var/folders/dk/4k5hcmcj1zv43kx0pgpys8j80000gq/T/jrrpW/f1JEx0uFQDq2xsVQ==.tmp
com.google.cloud.storage.StorageException: /var/folders/dk/4k5hcmcj1zv43kx0pgpys8j80000gq/T/jrrpW/f1JEx0uFQDq2xsVQ==.tmp
at com.google.cloud.storage.Blob.downloadTo(Blob.java:237)
at com.google.cloud.storage.Blob.downloadTo(Blob.java:274)
at com.hdsupply.mro2go.sap.service.FileService.load(FileService.java:90)
at com.hdsupply.mro2go.sap.service.FileService.retrieveNewFile(FileService.java:42)
at com.hdsupply.mro2go.sap.message.SapEventConsumer.processFile(SapEventConsumer.java:56)
at com.hdsupply.mro2go.sap.message.SapEventConsumer.accept(SapEventConsumer.java:35)
at com.hdsupply.mro2go.sap.message.SapEventConsumer.accept(SapEventConsumer.java:1)
at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.invokeConsumer(SimpleFunctionRegistry.java:975)
at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.doApply(SimpleFunctionRegistry.java:704)
at org.springframework.cloud.function.context.catalog.SimpleFunctionRegistry$FunctionInvocationWrapper.apply(SimpleFunctionRegistry.java:550)
at org.springframework.cloud.stream.function.PartitionAwareFunctionWrapper.apply(PartitionAwareFunctionWrapper.java:84)
at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionWrapper.apply(FunctionConfiguration.java:749)
at org.springframework.cloud.stream.function.FunctionConfiguration$FunctionToDestinationBinder$1.handleMessageInternal(FunctionConfiguration.java:581)
at org.springframework.integration.handler.AbstractMessageHandler.handleMessage(AbstractMessageHandler.java:56)
at org.springframework.integration.dispatcher.AbstractDispatcher.tryOptimizedDispatch(AbstractDispatcher.java:115)
at org.springframework.integration.dispatcher.UnicastingDispatcher.doDispatch(UnicastingDispatcher.java:133)
at org.springframework.integration.dispatcher.UnicastingDispatcher.dispatch(UnicastingDispatcher.java:106)
at org.springframework.integration.channel.AbstractSubscribableChannel.doSend(AbstractSubscribableChannel.java:72)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:317)
at org.springframework.integration.channel.AbstractMessageChannel.send(AbstractMessageChannel.java:272)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:187)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:166)
at org.springframework.messaging.core.GenericMessagingTemplate.doSend(GenericMessagingTemplate.java:47)
at org.springframework.messaging.core.AbstractMessageSendingTemplate.send(AbstractMessageSendingTemplate.java:109)
at org.springframework.integration.endpoint.MessageProducerSupport.sendMessage(MessageProducerSupport.java:208)
at com.google.cloud.spring.pubsub.integration.inbound.PubSubInboundChannelAdapter.consumeMessage(PubSubInboundChannelAdapter.java:144)
at com.google.cloud.spring.pubsub.core.subscriber.PubSubSubscriberTemplate.lambda$subscribeAndConvert$1(PubSubSubscriberTemplate.java:173)
at com.google.cloud.pubsub.v1.MessageDispatcher$4.run(MessageDispatcher.java:396)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.nio.file.NoSuchFileException: /var/folders/dk/4k5hcmcj1zv43kx0pgpys8j80000gq/T/jrrpW/f1JEx0uFQDq2xsVQ==.tmp
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:218)
at java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:484)
at java.base/java.nio.file.Files.newOutputStream(Files.java:228)
at com.google.cloud.storage.Blob.downloadTo(Blob.java:234)
... 33 more
What might be causing this to happen at all, let alone only some of the time?
Related
I am trying to access a remote S3 bucket using Databricks.
From what I have understood, this is what my code should look like,
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", access_key)
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", secret_access_key)
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3.amazonaws.com")
df = spark.read.format("csv").option("header",True).load('s3a://bucket/path/to/file.csv')
df.show()
I seem to be getting the following error when trying to set the Spark configurations,
Py4JError: An error occurred while calling o371.set. Trace:
py4j.Py4JException: Method set([class java.lang.String, class java.util.ArrayList]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:341)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:349)
at py4j.Gateway.invoke(Gateway.java:286)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:750)
What am I doing wrong?
Check your configuration with S3 as per the official document:
Access_key = dbutils.secrets.get(scope = "Scope_name", key = "aws_access_key")
Secret_key = dbutils.secrets.get(scope = "Scope_name", key = "aws_secret_key")
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", Access_key)
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", Secret_key)
region = "aws-region-id"
sc._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3." + region + ".amazonaws.com")
conn = sc.textFile("s3a://%s/.../..." % aws_bucket_name)
conn.count()
Or
You can try this alternative approach Mounting AWS S3 buckets on Databricks by Deepak Rajak
Using Hazelcast version 4.2.5 in a webapp deployed on Tomcat on Kubernetes. We're frequently("every 5 seconds") seeing ClassCastException with a stacktrace in the application logs.
Here's the ClassCastException :
java.lang.ClassCastException: class java.lang.String cannot be cast to class com.hazelcast.internal.serialization.impl.HeapData (java.lang.String is in module java.base of loader 'bootstrap'; com.hazelcast.internal.serialization.impl.HeapData is in unnamed module of loader org.apache.catalina.loader.ParallelWebappClassLoader #2f04993d)
27-Oct-2022 22:57:56.357 WARNING [hz.rogueUsers.cached.thread-2] com.hazelcast.internal.metrics.impl.MetricsCollectionCycle.null Collecting metrics from source com.hazelcast.replicatedmap.impl.ReplicatedMapService failed
at com.hazelcast.internal.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:102)
at com.hazelcast.internal.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76)
at java.base/java.lang.Thread.run(Thread.java:834)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at com.hazelcast.internal.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:217)
at com.hazelcast.spi.impl.executionservice.impl.DelegateAndSkipOnConcurrentExecutionDecorator$DelegateDecorator.run(DelegateAndSkipOnConcurrentExecutionDecorator.java:77)
at com.hazelcast.internal.metrics.impl.MetricsService.collectMetrics(MetricsService.java:154)
at com.hazelcast.internal.metrics.impl.MetricsService.collectMetrics(MetricsService.java:160)
at com.hazelcast.internal.metrics.impl.MetricsRegistryImpl.collect(MetricsRegistryImpl.java:316)
at com.hazelcast.internal.metrics.impl.MetricsCollectionCycle.collectDynamicMetrics(MetricsCollectionCycle.java:88)
at com.hazelcast.replicatedmap.impl.ReplicatedMapService.provideDynamicMetrics(ReplicatedMapService.java:387)
at com.hazelcast.replicatedmap.impl.ReplicatedMapService.getStats(ReplicatedMapService.java:357)
at com.hazelcast.replicatedmap.impl.ReplicatedMapService.getLocalReplicatedMapStats(ReplicatedMapService.java:197)
at com.hazelcast.replicatedmap.impl.LocalReplicatedMapStatsProvider.getLocalReplicatedMapStats(LocalReplicatedMapStatsProvider.java:85)
Here's how we're setting up Hazelcast.
private static HazelcastInstance setupHazelcastConfig() {
Config config = new Config();
config.setInstanceName("rogueUsers");
NetworkConfig network = config.getNetworkConfig();
network.setPort(5701).setPortCount(20);
network.setPortAutoIncrement(true);
JoinConfig join = network.getJoin();
join.getMulticastConfig().setEnabled(true);
// join.getTcpIpConfig()
// .setEnabled(true);
HazelcastInstance hz = Hazelcast.getOrCreateHazelcastInstance(config);
ReplicatedMapConfig replicatedMapConfig =
config.getReplicatedMapConfig("rogueUsers");
replicatedMapConfig.setInMemoryFormat(InMemoryFormat.BINARY);
replicatedMapConfig.setAsyncFillup(true);
replicatedMapConfig.setStatisticsEnabled(true);
replicatedMapConfig.setSplitBrainProtectionName("splitbrainprotection-name");
ReplicatedMap<String, String> map = hz.getReplicatedMap("rogueUsers");
map.addEntryListener(new RogueEntryListener());
return hz;
}
Is this a configuration issue ?
How do I fix this ?
Thanks very much,
The exception is being thrown from the following line:
if (isBinary) {
memoryUsage += ((HeapData) record.getValueInternal()).getHeapCost(); <-- exception
}
which is line 85 of com.hazelcast.replicatedmap.impl.LocalReplicatedMapStats class. The condition being checked is as the following:
boolean isBinary = (replicatedMapConfig.getInMemoryFormat() == InMemoryFormat.BINARY);
so basically, it is related to the format you are saving the data (from the config above you have chosen BINARY).
However, I don't think you are following it correctly since you do the following: ReplicatedMap<String, String> map = hz.getReplicatedMap("rogueUsers"); in your config.
From the Javadoc of com.hazelcast.internal.serialization.Data class:
Data is basic unit of serialization. It stores binary form of an object serialized by SerializationService.toData(Object).
Therefore, try editing your config to this:
ReplicatedMap<Data, Data> map = hz.getReplicatedMap("rogueUsers");
The DataStream job:
public static final FunctionType DEVICE = new FunctionType("com.github.f1xman.era.anomalydetection.device", "DeviceFunction");
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StatefulFunctionsConfig statefunConfig = StatefulFunctionsConfig.fromEnvironment(env);
statefunConfig.setFactoryType(MessageFactoryType.WITH_KRYO_PAYLOADS);
DataStreamSource<String> names = env.addSource(new NamesSourceFunction());
DataStream<RoutableMessage> namesIngress = names.map(name -> RoutableMessageBuilder.builder()
.withTargetAddress(DEVICE, name)
.withMessageBody(name)
.build());
StatefulFunctionDataStreamBuilder.builder("example")
.withDataStreamAsIngress(namesIngress)
.withRequestReplyRemoteFunction(
requestReplyFunctionBuilder(DEVICE, URI.create("http://localhost:8080/statefun"))
)
.withConfiguration(statefunConfig)
.build(env);
env.execute("Flink Streaming Java API Skeleton");
}
When String value passed to .withMessageBody(...) the following exception occurred:
Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:137)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:258)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1389)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975)
at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
at akka.dispatch.OnComplete.internal(Future.scala:300)
at akka.dispatch.OnComplete.internal(Future.scala:297)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
at scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:60)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala)
at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:68)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:284)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:621)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24)
at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)
at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:532)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
at scala.concurrent.impl.CallbackRunnable.run$$$capture(Promise.scala:60)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:81)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
at java.util.concurrent.ForkJoinTask.doExec$$$capture(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:175)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:252)
at org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:242)
at org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:233)
at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:684)
at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:444)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:316)
at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:314)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:217)
at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:78)
at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:163)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
at akka.actor.ActorCell.receiveMessage$$$capture(ActorCell.scala:580)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala)
at akka.actor.ActorCell.invoke(ActorCell.scala:548)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
... 5 more
Caused by: org.apache.flink.statefun.flink.core.functions.StatefulFunctionInvocationException: An error occurred when attempting to invoke function FunctionType(com.github.f1xman.era.anomalydetection.device, DeviceFunction).
at org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:50)
at org.apache.flink.statefun.flink.core.functions.ReusableContext.apply(ReusableContext.java:74)
at org.apache.flink.statefun.flink.core.functions.LocalFunctionGroup.processNextEnvelope(LocalFunctionGroup.java:60)
at org.apache.flink.statefun.flink.core.functions.Reductions.processEnvelopes(Reductions.java:164)
at org.apache.flink.statefun.flink.core.functions.Reductions.apply(Reductions.java:149)
at org.apache.flink.statefun.flink.core.functions.FunctionGroupOperator.processElement(FunctionGroupOperator.java:90)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.pushToOperator(CopyingChainingOutput.java:82)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:57)
at org.apache.flink.streaming.runtime.tasks.CopyingChainingOutput.collect(CopyingChainingOutput.java:29)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:56)
at org.apache.flink.streaming.api.operators.CountingOutput.collect(CountingOutput.java:29)
at org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.sendDownstream(FeedbackUnionOperator.java:180)
at org.apache.flink.statefun.flink.core.feedback.FeedbackUnionOperator.processElement(FeedbackUnionOperator.java:86)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:496)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:203)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:809)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:761)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.flink.statefun.sdk.reqreply.generated.TypedValue
at org.apache.flink.statefun.flink.core.reqreply.RequestReplyFunction.invoke(RequestReplyFunction.java:118)
at org.apache.flink.statefun.flink.core.functions.StatefulFunction.receive(StatefulFunction.java:48)
... 25 more
Though, sending a String value to an embedded function works well. The workaround I've found is to wrap the value with TypedValue:
.withMessageBody(TypedValue.newBuilder()
.setValue(ByteString.copyFrom(name, StandardCharsets.UTF_8))
.setHasValue(true)
.setTypename("example/Name")
.build()
)
This approach requires the receiver function to unwrap the TypedValue and deserialize the ByteString. It looks too low-level for this kind of API. I believe this is the wrong usage of Stateful Function's SDK for Flink DataStream Integration. What is the correct way to implement Flink DataStream and remote Stateful Functions interoperability?
The job is inspired by the official examples.
I'm trying to use save point on a job that I have implemented a customized parallelizable socket source. The source looks something similar to this
#Override
public void run(SourceContext<String> sourceContext) throws Exception {
int idx = getRuntimeContext().getIndexOfThisSubtask();
String[] hosts = (config.hostsStr).split(":");
String[] portStrArr = (config.portsStr).split(":");
int[] ports = new int[portStrArr.length];
for (int i = 0; i < portStrArr.length; i++) {
ports[i] = Integer.parseInt(portStrArr[i]);
}
Socket s = new Socket(hosts[idx], ports[idx]);
BufferedReader in = new BufferedReader(new InputStreamReader(s.getInputStream()));
//ois = new ObjectInputStream(s.getInputStream());
while (running) {
String str = in.readLine();
sourceContext.collect(str);
}
sourceContext.close();
}
#Override
public void cancel() {
running = false;
}
The exception on the cluster looks something like this
flink-1.1.3/bin//flink cancel -s hdfs://flink-master:19000/flink-checkpoints a18499a80099045eb5120ecacdabd421
Retrieving JobManager.
Using address flink-master/10.0.0.16:6123 to connect to JobManager.
Cancelling job a18499a80099045eb5120ecacdabd421 with savepoint to hdfs://flink-master:19000/flink-checkpoints.
java.lang.Exception: Canceling the job with ID a18499a80099045eb5120ecacdabd421 failed.
at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:637)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1092)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130)
Caused by: java.lang.Exception: Failed to trigger savepoint.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$6.apply(JobManager.scala:639)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$6.apply(JobManager.scala:629)
at org.apache.flink.runtime.concurrent.impl.FlinkFuture$5.onComplete(FlinkFuture.java:272)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:245)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1347)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
Suppressed: java.lang.IllegalArgumentException: Self-suppression not permitted
at java.lang.Throwable.addSuppressed(Throwable.java:1043)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.storeSavepointToHandle(SavepointStore.java:207)
at org.apache.flink.runtime.checkpoint.savepoint.SavepointStore.storeSavepointToHandle(SavepointStore.java:150)
at org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpointExternalized(PendingCheckpoint.java:281)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:888)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:813)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$1.apply$mcV$sp(JobManager.scala:1462)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$1.apply(JobManager.scala:1461)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$1.apply(JobManager.scala:1461)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[CIRCULAR REFERENCE:java.io.EOFException: Premature EOF: no length prefix available]
On my local machine the save point is rejected by the following exception:
Cancelling job 4c99e0220c8c4683d1287269073b5c2c with savepoint to savepoints/.
java.lang.Exception: Canceling the job with ID 4c99e0220c8c4683d1287269073b5c2c failed.
at org.apache.flink.client.CliFrontend.cancel(CliFrontend.java:637)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1092)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)
at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130)
Caused by: java.lang.Exception: Failed to trigger savepoint.
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$6.apply(JobManager.scala:639)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$6.apply(JobManager.scala:629)
at org.apache.flink.runtime.concurrent.impl.FlinkFuture$5.onComplete(FlinkFuture.java:272)
at akka.dispatch.OnComplete.internal(Future.scala:247)
at akka.dispatch.OnComplete.internal(Future.scala:245)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:175)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:172)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.Exception: Checkpoint was declined (tasks not ready)
at org.apache.flink.runtime.checkpoint.PendingCheckpoint.abortDeclined(PendingCheckpoint.java:510)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveDeclineMessage(CheckpointCoordinator.java:735)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$2.apply$mcV$sp(JobManager.scala:1491)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$2.apply(JobManager.scala:1490)
at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$org$apache$flink$runtime$jobmanager$JobManager$$handleCheckpointMessage$2.apply(JobManager.scala:1490)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
... 6 more
Is it because my source cannot be stopped properly so that the checkpoint would not happen? On the cluster it does say it is successful and return the location to the save point but there is no file on that path.
Given the source function excerpt it almost looks good to me. What you should do is to output elements under the checkpoint lock. Otherwise you might run into problems when an element is output at the same time as a checkpoint is triggered. The SourceContext#getCheckpointLock makes sure that these two operations don't happen concurrently.
The first error looks a little bit as if you have a problem on the HDFS side. Could you check the logs whether they contain something suspicious? Maybe the data nodes ran out of disk space.
The second exception indicates that something went wrong while doing the checkpoint. The JobManager logs should contain a log statement saying why the checkpoint has failed. It should have the format: Discarding checkpoint CHECKPOINT_ID because of checkpoint decline from task EXECUTION_ID : REASON.
I am using the below configuration
ClouderaTwitterAgent.sources = Twitter
ClouderaTwitterAgent.channels = MemChannel
ClouderaTwitterAgent.sinks = HDFS
ClouderaTwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
ClouderaTwitterAgent.sources.Twitter.channels = MemChannel
ClouderaTwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxx
ClouderaTwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxx
ClouderaTwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxxx
ClouderaTwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxx
ClouderaTwitterAgent.sources.Twitter.keywords = Sully
ClouderaTwitterAgent.sinks.HDFS.channel = MemChannel
ClouderaTwitterAgent.sinks.HDFS.type = hdfs
ClouderaTwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/tweets
ClouderaTwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
ClouderaTwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
ClouderaTwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
ClouderaTwitterAgent.sinks.HDFS.hdfs.rollSize = 0
ClouderaTwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
ClouderaTwitterAgent.channels.MemChannel.type = memory
ClouderaTwitterAgent.channels.MemChannel.capacity = 10000
ClouderaTwitterAgent.channels.MemChannel.transactionCapacity = 100
and this is the command that I am using to run flume
`bin/flume-ng agent --conf ./conf/ -f conf/flume-cloudera.conf -Dflume.root.logger=DEBUG,console -n ClouderaTwitterAgent`
and this is the error that I am getting
2016-09-20 14:53:14,245 (Twitter4J Async Dispatcher[0]) [DEBUG - com.cloudera.flume.source.TwitterSource$1.onStatus(TwitterSource.java:121)] tweet arrived
2016-09-20 14:53:16,073 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://localhost:9000/user/tweets/FlumeData.1474363316543.tmp
2016-09-20 14:53:16,113 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.hadoop.security.Groups.<init>(Groups.java:64)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:232)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:718)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:703)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:605)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2554)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2546)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2412)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
... 21 more
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.security.JniBasedUnixGroupsMapping
at org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.<init>(JniBasedUnixGroupsMappingWithFallback.java:38)
... 26 more
2016-09-20 14:53:16,121 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR-org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
Can anyone please help me. I am new to this and I got this configuration from the internet and I followed every instruction.
I even searched for any solution to this problem.
I will list out what I have done
1.I checked my system clock
2.removed the hdfs folder and then made a new folder
3.formatted the namenode
4.restarted the agent several times