Deque Full Exception - Thingsboard - java

Im having trouble using Thingsboard platform (IoT) when simulating 7.5K devices sending data to the platform. I have the following error in the logs as soon as I start sending data (over MQTT):
2020-08-01 01:17:06,946 [ForkJoinPool-12-worker-0] ERROR c.g.c.u.concurrent.AggregateFuture - Got more than one input Future failure. Logging failures after the first
java.lang.IllegalStateException: Deque full
at java.util.concurrent.LinkedBlockingDeque.addLast(LinkedBlockingDeque.java:335)
at java.util.concurrent.LinkedBlockingDeque.add(LinkedBlockingDeque.java:633)
at org.thingsboard.server.dao.util.AbstractBufferedRateExecutor.submit(AbstractBufferedRateExecutor.java:109)
at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsync(CassandraAbstractDao.java:93)
at org.thingsboard.server.dao.nosql.CassandraAbstractDao.executeAsyncWrite(CassandraAbstractDao.java:76)
at org.thingsboard.server.dao.timeseries.CassandraBaseTimeseriesDao.savePartition(CassandraBaseTimeseriesDao.java:434)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.saveAndRegisterFutures(BaseTimeseriesService.java:153)
at org.thingsboard.server.dao.timeseries.BaseTimeseriesService.save(BaseTimeseriesService.java:144)
at org.thingsboard.server.service.telemetry.DefaultTelemetrySubscriptionService.saveAndNotify(DefaultTelemetrySubscriptionService.java:124)
at org.thingsboard.rule.engine.telemetry.TbMsgTimeseriesNode.onMsg(TbMsgTimeseriesNode.java:89)
at org.thingsboard.server.actors.ruleChain.RuleNodeActorMessageProcessor.onRuleChainToRuleNodeMsg(RuleNodeActorMessageProcessor.java:107)
at org.thingsboard.server.actors.ruleChain.RuleNodeActor.onRuleChainToRuleNodeMsg(RuleNodeActor.java:97)
at org.thingsboard.server.actors.ruleChain.RuleNodeActor.doProcess(RuleNodeActor.java:60)
at org.thingsboard.server.actors.service.ContextAwareActor.process(ContextAwareActor.java:45)
at org.thingsboard.server.actors.TbActorMailbox.processMailbox(TbActorMailbox.java:121)
at java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1402)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
I have try to google and see the reason behind it, but i havent found anything.
While simulating with 5K devices, this error came like 3 times each day (over a 4 day period), but it stopped showing that error eventually. However, when increasing the number of devices, the error is constant. Im using Kafka as the broker, but I dont see any Kafka related error. I just want to know why the error appears, is it related to memory, or any other limit?
Thanks in advance
Francisco P

Related

How to avoid "failed to send operations" errors after switching to the "exactly-once" delivery strategy?

I recently tried to switch my subscriptions in GCP Pub/Sub to the "exactly-once" delivery strategy. However, I started seeing the following warnings ~10 times every 30 minutes in my application logs:
com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Some acknowledgement ids in the request were invalid. This could be because the acknowledgement ids have expired or the acknowledgement ids were malformed.
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:92)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:98)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:66)
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:67)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1041)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1215)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:983)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:771)
at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:574)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:544)
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at com.google.api.gax.grpc.ChannelPool$ReleasingClientCall$1.onClose(ChannelPool.java:535)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:563)
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:70)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:744)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:723)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Some acknowledgement ids in the request were invalid. This could be because the acknowledgement ids have expired or the acknowledgement ids were malformed.
at io.grpc.Status.asRuntimeException(Status.java:535)
... 17 more
They're immediately followed by the following INFO log messages in the same thread:
Permanent error invalid ack id message, will not resend
I didn't see any problems caused by these warnings, but it's a bit hard to tell because my application is processing a decent number of messages (~1000/hour). I initially thought that these warnings were just short-term "aftershocks" from switching to the "exactly-once" strategy. However, I waited for about 2 hours and they kept occurring with the same frequency, showing no sign of disappearing. I then disabled the "exactly-once" strategy and they went away immediately after. Can anyone tell me whether these warnings are dangerous, what side effects I can expect, and most importantly how I can get rid of them?
I'm using version 3.4.0 of spring-cloud-gcp-dependencies and spring-cloud-gcp-starter-pubsub. I'm also using Spring Cloud Stream to process the incoming messages and I rely on it to automatically acknowledge the messages.
I have the following configuration set in my application.yaml file:
spring:
cloud:
gcp:
pubsub:
subscriber:
executor-threads: 15
max-ack-extension-period: 23400 # 6 hours and 30 minutes
acknowledgement-deadline: 600 # Maximum value
For context: The messages in my application represent jobs for execution, and they could take quite a while to finish - hence the 6h30m maximum acknowledgement extension period.
I also saw the following StackOverflow question:
How to handle errors during message acknowledgement using google pubsub java library?
From what I understand, the consequence of these warnings is that the messages will be redelivered to my application, but this is exactly what I want to avoid.
Thanks for the question, Alexander.
The errors you are seeing happen when the modifyAckDeadline or Acknowledgement request to the service fail because the acknowledgement id is already expired. In such cases, the service considers the expired acknowledgement id as invalid, since a newer delivery might already be in-flight. This is as-per the guarantees for exactly once delivery. There could be a few reasons for it:
The request was delayed due to network delays and by the time it arrived at the server, the acknowledgement id lease is already expired.
The tasks issuing the modifyAckDeadline or Acknowledgement request is overwhelmed (high CPU/memory/network usage), leading to delay in issuing those requests.
I suggest setting min-duration-per-ack-extension to a high number to reduce issues mentioned above. This will help mitigate the cases where the acknowledgement id lease expired. The highest value you can set for this field is 600 seconds.
Additionally, as mentioned in the other stack overflow question, you should consider checking the response of your acknowledgement operations. This can be used to guide your application if it can expect a redelivery. Sample.

Nativescript angular app breaking with "java.lang.IllegalStateException: Too many receivers, total of 1000, registered for pid"

I've creating a nativescript application which uses the nativescript-barcodescanner plugin to scan and decode qr codes. This application is intended to be used to scan a lot of qr codes, but not in consecutive order. However, after a 1000 scans, the application is breaking with the following exception.
java.lang.RuntimeException: Unable to resume activity {org.nativescript.test/com.google.zxing.client.android.CaptureActivity}: java.lang.IllegalStateException: Too many receivers, total of 1000, registered for pid: 15623, callerPackage: org.nativescript.test
at android.app.ActivityThread.performResumeActivity(ActivityThread.java:4021)
at android.app.ActivityThread.handleResumeActivity(ActivityThread.java:4053)
at android.app.servertransaction.ResumeActivityItem.execute(ResumeActivityItem.java:51)
at android.app.servertransaction.TransactionExecutor.executeLifecycleState(TransactionExecutor.java:145)
at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:70)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1955)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:214)
at android.app.ActivityThread.main(ActivityThread.java:7078)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:494)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:975)
Caused by: java.lang.IllegalStateException: Too many receivers, total of 1000, registered for pid: 15623, callerPackage: org.nativescript.test
at android.os.Parcel.createException(Parcel.java:1974)
at android.os.Parcel.readException(Parcel.java:19
// Allocate time for this cycle to end before issueing a validate request. Attempting to fix Fragment issue on some
// mobile devices34)
at android.os.Parcel.readException(Parcel.java:1884)
at android.app.IActivityManager$Stub$Proxy.registerReceiver(IActivityManager.java:3684)
at android.app.ContextImpl.registerReceiverInternal(ContextImpl.java:1567)
at android.app.ContextImpl.registerReceiver(ContextImpl.java:1528)
at android.app.ContextImpl.registerReceiver(ContextImpl.java:1516)
at android.content.ContextWrapper.registerReceiver(ContextWrapper.java:636)
at com.google.zxing.client.android.InactivityTimer.onResume(InactivityTimer.java:69)
at com.google.zxing.client.android.CaptureActivity.onResume(CaptureActivity.java:222)
at android.app.Instrumentation.callActivityOnResume(Instrumentation.java:1416)
at android.app.Activity.performResume(Activity.java:7609)
at android.app.ActivityThread.performResumeActivity(ActivityThread.java:4013)
... 11 more
Caused by: android.os.RemoteException: Remote stack trace:
at com.android.server.am.ActivityManagerService.registerReceiver(ActivityManagerService.java:25447)
at android.app.IActivityManager$Stub.onTransact$registerReceiver$(IActivityManager.java:10896)
at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:126)
at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:4162)
at android.os.Binder.execTransact(Binder.java:739)
After attempting to debug the issue, from what i've found, it seems that when calling the BarcodeScanner.scan function offered by the nativescript-barcodescanner plugin, an Activity is started for result as per this code, which in turn, a receiver is registered by the google zxing library (used by this plugin). What seems to be the issue is that upon a successful scan, when the Activity returns the result, the google zxing library is not doing the necessary clean ups, resulting with dangling receivers registered. Note that when the scanner is closed by pressing the back button, the clean ups are carried out as expected.
Any help about going around this issue would be appreciated.
Edit: Added link to github repo for reproduction
https://github.com/jeanpaulattard/nativescript-barcodescanner-demo

Azure Spring Boot ZipException

I've got a spring boot rest service with embedded tomcat deployed on an Azure App Service and I am experiencing intermittent outages, every few weeks.
Every time there is an outage the logs contain the following entries:
Message: java.util.zip.ZipException: ZIP_Read: error reading zip file ZIP_Read: error reading zip file
Exception type: java.util.zip.ZipException
Failed method: java.util.zip.ZipFile.access$1400
With the following call stack.
java.lang.IllegalStateException:
at org.apache.catalina.webresources.JarWarResourceSet.getArchiveEntries (JarWarResourceSet.java133)
at org.apache.catalina.webresources.AbstractArchiveResourceSet.getResource (AbstractArchiveResourceSet.java256)
at org.apache.catalina.webresources.StandardRoot.getResourceInternal (StandardRoot.java281)
at org.apache.catalina.webresources.CachedResource.validateResource (CachedResource.java97)
at org.apache.catalina.webresources.Cache.getResource (Cache.java69)
at org.apache.catalina.webresources.StandardRoot.getResource (StandardRoot.java216)
at org.apache.catalina.webresources.StandardRoot.getResource (StandardRoot.java206)
at org.apache.catalina.mapper.Mapper.internalMapWrapper (Mapper.java1027)
at org.apache.catalina.mapper.Mapper.internalMap (Mapper.java842)
at org.apache.catalina.mapper.Mapper.map (Mapper.java698)
at org.apache.catalina.connector.CoyoteAdapter.postParseRequest (CoyoteAdapter.java679)
at org.apache.catalina.connector.CoyoteAdapter.service (CoyoteAdapter.java336)
at org.apache.coyote.http11.Http11Processor.service (Http11Processor.java803)
at org.apache.coyote.AbstractProcessorLight.process (AbstractProcessorLight.java66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process (AbstractProtocol.java868)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun (NioEndpoint.java1459)
at org.apache.tomcat.util.net.SocketProcessorBase.run (SocketProcessorBase.java49)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run (TaskThread.java61)
at java.lang.Thread.run (Thread.java748)
Inner exception java.util.zip.ZipException handled at org.apache.catalina.webresources.JarWarResourceSet.getArchiveEntries:
at java.util.zip.ZipFile.access$1400 (ZipFile.java60)
at java.util.zip.ZipFile$ZipFileInputStream.read (ZipFile.java734)
at java.io.FilterInputStream.read (FilterInputStream.java133)
at java.io.PushbackInputStream.read (PushbackInputStream.java186)
at java.util.zip.ZipInputStream.readFully (ZipInputStream.java403)
at java.util.zip.ZipInputStream.readLOC (ZipInputStream.java278)
at java.util.zip.ZipInputStream.getNextEntry (ZipInputStream.java122)
at java.util.jar.JarInputStream.<init> (JarInputStream.java83)
at java.util.jar.JarInputStream.<init> (JarInputStream.java62)
at org.apache.catalina.webresources.TomcatJarInputStream.<init> (TomcatJarInputStream.java37)
at org.apache.catalina.webresources.JarWarResourceSet.getArchiveEntries (JarWarResourceSet.java108)
at org.apache.catalina.webresources.AbstractArchiveResourceSet.getResource (AbstractArchiveResourceSet.java256)
at org.apache.catalina.webresources.StandardRoot.getResourceInternal (StandardRoot.java281)
at org.apache.catalina.webresources.CachedResource.validateResource (CachedResource.java97)
at org.apache.catalina.webresources.Cache.getResource (Cache.java69)
at org.apache.catalina.webresources.StandardRoot.getResource (StandardRoot.java216)
at org.apache.catalina.webresources.StandardRoot.getResource (StandardRoot.java206)
at org.apache.catalina.mapper.Mapper.internalMapWrapper (Mapper.java1027)
at org.apache.catalina.mapper.Mapper.internalMap (Mapper.java842)
at org.apache.catalina.mapper.Mapper.map (Mapper.java698)
at org.apache.catalina.connector.CoyoteAdapter.postParseRequest (CoyoteAdapter.java679)
at org.apache.catalina.connector.CoyoteAdapter.service (CoyoteAdapter.java336)
at org.apache.coyote.http11.Http11Processor.service (Http11Processor.java803)
at org.apache.coyote.AbstractProcessorLight.process (AbstractProcessorLight.java66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process (AbstractProtocol.java868)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun (NioEndpoint.java1459)
at org.apache.tomcat.util.net.SocketProcessorBase.run (SocketProcessorBase.java49)
at java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run (TaskThread.java61)
at java.lang.Thread.run (Thread.java748)
As soon as this occurs, all calls to the rest service returns a 500 and the log gets another entry like above. This continues until I manually reboot the app service.
I'm struggling to figure out what is the issue. Googling the exception only returns results for issues with opening zip files. The app does not do any zipping/unzipping in itself and the call stack seems to indicate that it's tomcat things(that could be caused by something I've done on Azure, in SpringBoot, the JVM or something else entirely).
CPU or memory usage appear to be fine preceding the outages so that doesn't seem to be a factor.
This issue is not preceded by any deployments or platform changes.
I'm stumped regarding what to do next, if anyone can point me in the right direction to investigate, it'll be much appreciated.

HP Fortify 3.80: an internal error has occurred

My upload token expired and upon executing
./fortifyclient token -getoken AnalysisUploadToken -url"http://<localhost>/ssc" -user ssc_upload
I receive
An internal error has occurred.
A JAXB unmarshalling exception;
nested exception is javax.xml.bind.UnmarshalException: unexpected element
I would show the rest, however it is approx. 200 lines.
The last time this happened (90 days ago), I used the 4.00 version of ./fortifyclient and it worked.
Any suggestions?
Is time synchronized between your client and server? I think that any operation with fortifyclient will fail if the time on the client and server differs by more than 5 or 10 minutes.
This will include checking the date and timezone as well.

Building a storm topology with shell bolts

I'm currently trying to implement a Storm topology that integrates with the R language.
As a starting point, i took the following project (https://github.com/allenday/R-Storm) which works by extending the ShellBolt class to implement R integration, as well as an R library to handle communication between the java and R sides.
My problem is that if i create a topology based on regular (java-only) bolts, i can chain them together without issue. However, when one of the bolts in the middle of the chain is an R Shell Bolt, the thing just falls apart with:
5661 [Thread-18] ERROR backtype.storm.util - Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
Shell Process Exception:
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.daemon.executor$fn__3557$fn__3569$fn__3616.invoke(executor.clj:715) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.util$async_loop$fn__436.invoke(util.clj:377) ~[storm-0.9.0-wip16.jar:na]
at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_25]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
More concrete, the following topology works as expected:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 1);
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
Where PermuteBolt is an R Shell Bolt. The logs for this example show the expected output:
6246 [Thread-18] INFO backtype.storm.daemon.task - Emitting: spout default [four score and seven years ago]
6246 [Thread-16] INFO backtype.storm.daemon.executor - Processing received message source: spout:3, stream: default, id: {}, [four score and seven years ago]
6261 [Thread-23] INFO backtype.storm.daemon.task - Emitting: permutebolt default ["PERMUTE seven years ago and four score"]
If, however i add another bolt that gets its data from the first one, such as:
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
builder.setBolt("identity", new IdentityBolt(new Fields("identity")), 1).fieldsGrouping("permutebolt", new Fields("permutation"));
It fails with the trace printed above. Also, what's weird is that this second example which is failing is included with the project.
Is this an issue anyone has faced before ?
UPDATE: I noticed this only occurs when using R Shell Bolts, i have since tried launching bolts that use python scripts and have been able to chain them normally.
#andrei, this is fixed in 1.01 uploaded to github today:
https://github.com/allenday/R-Storm/releases/tag/v1.01
It has been submitted to CRAN and will be available soon.
Thanks for reporting.
-Allen

Categories

Resources