Huge performance issue using camel routes in karaf - java

I have a tricky issue with karaf, and having tried all day to fix it, I need your insights. Here is the problem:
I have camel routes (pure java DSL) that get data from 2 sources, process them, and then send the results to a redis
- when using as standalone application (with a Main class and a command line "java -jar myjar.jar"), data are processed and saved in less than 20minutes
- when using them as a bundle (part of another feature actually) , on the same machine, it takes about 10 hours .
EDIT: I forgot to add: I use camel 2.1.0 and karaf 2.3.2
Now, we are in the process of refactoring our SI to karaf features, so sadly, it's not really possible to just keep the standalone app.
I tried playing with karaf java memory option, using a cluster (I failed :d ) playing with SEDA and threadpool, replacing all direct route by a seda, without success. A dev:create-dump shows a lot of
thread #38 - Split" Id=166 BLOCKED on java.lang.Class#56d1396f owned by "Camel (camelRedisProvisioning)
Could it be an issue with split and parallelProcessing in karaf ? Standalone app shows indeed a LOT more CPU activity.
Here are my camel route
//start with a quartz and a cron tab
from("quartz://provisioning/topOffersStart?cron=" + cronValue.replace(' ', '+')).multicast()
.parallelProcessing().to("direct:prodDAO", "direct:thesaurus");
//get from two sources and process
from("direct:prodDAO").bean(ProductsDAO.class)
.setHeader("_type", constant(TopExport.PRODUCT_TOP))
.setHeader("topOffer", constant("topOffer"))
.to("direct:topOffers");
from("direct:thesaurus")
.to(thesaurusUri).unmarshal(csv).bean(ThesaurusConverter.class, "convert")
.setHeader("_type", constant(TopExport.CATEGORY_TOP))
.setHeader("topOffer", constant("topOffer"))
.to("direct:topOffers");
//processing
from("direct:topOffers").choice()
.when(isCategory)
.to("direct:topOffersThesaurus")
.otherwise()
.when(isProduct)
.to("direct:topOffersProducts")
.otherwise()
.log(LoggingLevel.ERROR, "${header[_type]} is not valid !")
.endChoice()
.endChoice()
.end();
from("direct:topOffersThesaurus")
//here is where I think the problem comes
.split(body()).parallelProcessing().streaming()
.bean(someprocessing)
.to("direct:toRedis");
from("direct:topOffersProducts")
//here is where I think the problem comes
.split(body()).parallelProcessing().streaming()
.bean(someprocessing)
.to("direct:toRedis");
//save into redis
from("direct:toRedis")
.setHeader("CamelRedis.Key", simple("provisioning:${header[_topID]}"))
.setHeader("CamelRedis.Command", constant("SETEX"))
.setHeader("CamelRedis.Timeout", constant("90000"))//25h
.setHeader("CamelRedis.Value", simple("${body}"))
.to("spring-redis://?redisTemplate=#provisioningRedisTemplateStringSerializer");
NB: the body sent to direct:topOffers[products|thesaurus] is a list of pojo (the same class)
Thanks to anyone that can help
EDIT:
I think I narrowed it down to a deadlock on jaxb. Indeed, in my routes, I make lots of call to a java client calling a web service. When using karaf, thread are block there :
java.lang.Thread.State: BLOCKED (on object monitor) at com.sun.xml.bind.v2.runtime.reflect.opt.AccessorInjector.prepare(AccessorInjector.java:78)
further down the stack trace, we see the unmarshalling method used to transform the xml in object, those 2 line we suspect to me
final JAXBContext context = JAXBContext.newInstance(clazz.getPackage().getName());
final Unmarshaller um = context.createUnmarshaller();
I remove the final, no improvements. Maybe something to do with the jaxb used by karaf ? I do not install any jaxb impl with the bundle

Nailed it !
As seen above, it was indeed linked to a deadlock on the jaxb context in my webservice client.
what I did :
- refactoring of the old code for that client by removing the final keyword on Marshaller/Unmarshaller object (I think the deadlock came from there, even if it was the exact same code when runing on standalone)
- instanciate the context based on the package, and only once.
I must admit Classloader issues with OSGI had me banging my head on my desk for a few hours, but thanks to Why can't JAXB find my jaxb.index when running inside Apache Felix? , I manage to fix that
Granted, my threads are now sleeping instead of blocked, but now I process my data in less than 30 min, so that's good enough for me

Related

akka.pattern.AskTimeoutException while running Lagom HelloWorld example

I have a problem while trying my hands on the Hello World example explained here.
Kindly note that I have just modified the HelloEntity.java file to be able to return something other than "Hello, World!". Most certain my changes are taking time and hence I am getting the below Timeout error.
I am currently trying (doing a PoC) on a single node to understand the Lagom framework and do not have liberty to deploy multiple nodes.
I have also tried modifying the default lagom.circuit-breaker in application.conf "call-timeout = 100s" however, this does not seem to have helped.
Following is the exact error message for your reference:
{"name":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://hello-impl-application/system/sharding/HelloEntity#1074448247]] after [5000 ms]. Sender[null] sent message of type \"com.lightbend.lagom.javadsl.persistence.CommandEnvelope\".","detail":"akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://hello-impl-application/system/sharding/HelloEntity#1074448247]] after [5000 ms]. Sender[null] sent message of type \"com.lightbend.lagom.javadsl.persistence.CommandEnvelope\".\n\tat akka.pattern.PromiseActorRef$.$anonfun$defaultOnTimeout$1(AskSupport.scala:595)\n\tat akka.pattern.PromiseActorRef$.$anonfun$apply$1(AskSupport.scala:605)\n\tat akka.actor.Scheduler$$anon$4.run(Scheduler.scala:140)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:866)\n\tat scala.concurrent.BatchingExecutor.execute(BatchingExecutor.scala:109)\n\tat scala.concurrent.BatchingExecutor.execute$(BatchingExecutor.scala:103)\n\tat scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:864)\n\tat akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:328)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:279)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:283)\n\tat akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:235)\n\tat java.lang.Thread.run(Thread.java:748)\n"}
Question: Is there a way to increase the akka Timeout by modifying the application.conf or any of the java source files in the Hello World project? Can you please help me with the exact details.
Thanks in advance for you time and help.
The call timeout is the timeout for circuit breakers, which is configured using lagom.circuit-breaker.default.call-timeout. But that's not what is timing out above, the thing that is timing out above is the request to your HelloEntity, that timeout is configured using lagom.persistence.ask-timeout. The reason why there's a timeout on requests to entities is because in a multi-node environment, your entities are sharded across nodes, so an ask on them may go to another node, which is why a timeout is needed in case that node is not responding.
All that said, I don't think changing the ask-timeout will solve your problem. If you have a single node, then your entities should respond instantly if everything is working ok.
Is that the only error you're seeing in the logs?
Are you seeing this in devmode (ie, using the runAll command), or are you running the Lagom service some other way?
Is your database responding?
Thanks James for the help/pointer.
Adding following lines to resources/application.conf did the trick for me:
lagom.persistence.ask-timeout=30s
hello {
..
..
call-timeout = 30s
call-timeout = ${?CIRCUIT_BREAKER_CALL_TIMEOUT}
..
}
A Call is a Service-to-Service communication. That’s a SeviceClient communicating to a remote server. It uses a circuit breaker. It is a extra-service call.
An ask (in the context of lagom.persistence) is sending a command to a persistent entity. That happens across the nodes insied your Lagom service. It is not using circuit breaking. It is an intra-service call.

How do I send categorized and grouped logs into Stackdriver using the Java Client Library?

I'd like to aggregate multiple log.info log.warning and log.error calls, and possibly stack traces, into a single Stackdriver log line generated by the server interacting with my application code. The goal is to summarize a request handled by my Scala server, and then group as many logging statements as occurred during its execution, with any errors.
This is default behavior on GAE logging, but because I'm new to reading Java API's, I'm having trouble figuring out how to:
1/ create a custom MonitoredResource (?) representing, e.g., "API server", then specifying category within it (e.g. "production"). Specifically, do I have to create these via the REST API, even though I'm only doing it once for my deployment? Can I use something like Troposphere to define these in code and commit them in a repo?
2/ how the nouns MonitoredResource, MonitoredResourceDescriptor, LogEntry, LogEntryOperation and logName fit together, and where the categories "API Server" and "production" get defined, as well as logging statement groups like GET /foobar -> 200 response + 1834 bytes can be added (are those logNames?).
No need to write code for me, of course, but pointers and a high level overview to save me trial and error would be appreciated greatly.
You can group together multiple log entries for the same operation by using the LogEntryOperation field in the LogEntry (https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#LogEntryOperation).
In the Logs Viewer, you can group the log entries by filtering on the operation.id field using the advanced filters.
In the Java client library, you can set the Operation Id using https://googlecloudplatform.github.io/google-cloud-java/0.33.0/apidocs/com/google/cloud/logging/LogEntry.Builder.html#setOperation-com.google.cloud.logging.Operation-
1) The Monitored Resources you can use is a curated set defined by Google. You cannot define your own type. The supported resources are listed in https://cloud.google.com/logging/docs/api/v2/resource-list.
2) Basic concepts are described in https://cloud.google.com/logging/docs/basic-concepts.
1) The MonitoredResource monitors what you configure in MonitoredResourceDescriptor
I assume you can create it anyway you want (REST API or with the client libraries).
2) I am unsure where you want to describe "API Server" or "Production", the MonitoredResourceDescriptor is how to setup what to monitor. The LogEntry is the actual logs and the LogName is just a label you give this particular log. What you described should be something that a logEntry returns (the 200 code + other stuff).
I might be a bit confused at what you are asking. The best thing to do is to create a sample MonitoredResource and see how it works.

Zookeeper Network Ensemble does not start appropiately

I've been working with zookeeper lately to fill a requirement of reliablity in distributed applications. I'm working with three computers and I followed this tutorial:
http://sanjivblogs.blogspot.ie/2011/04/deploying-zookeeper-ensemble.html
I did step by step to ensure I did it well, but now when I start my zookeepers with the
./zkServer.sh start
I'm getting these exceptions for all my computers:
2013-04-05 21:46:58,995 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#679] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1961)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2038)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:831)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:62)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:667)
2013-04-05 21:46:58,995 [myid:2] - WARN [SendWorker:1:QuorumCnxManager$SendWorker#688] - Send worker leaving thread
2013-04-05 21:47:58,363 [myid:2] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker#762] - Connection broken for id 3, my id = 2, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:747)
But I don't know what I am doing wrong to get this. My objective is to syncrhonize my zookeepers in different machines to get always an available service. I went to zookeeper.apache.org Web Päge and look for the same information on how to configure and start my zookeeper, but are the same steps I followed before.
If somebody could help me please I would really appretiate it. Thanks in advance.
I will need to follow some strict steps to achieve this, but finally done it. If somebody is facing the same issue, to make the zookeeper enssemble, please remember:
You need 3 zookeeper servers running (local or over the network), this is the minimum number to achieve the synchronization. In each server, it is needed to create a file called "myid" (inside the zookeeper folder), the content of each myid file must be a sequential number, for instance, I have three zookeeper servers (folders), so I have a myid with content 1, other with content 2 and other with content 3.
Then in the zoo.cfg it is necessary to stablish the parameters required:
tickTime=2000
#dataDir=/var/lib/zookeeper
dataDir=/home/mtataje/var/zookeeper1
clientPort=2184
initLimit=10
syncLimit=20
server.1=192.168.3.41:2888:3888
server.2=192.168.3.41:2889:3889
server.3=192.168.3.41:2995:2999
The zoo.cfg varies from each server to another, in my case because I was testing on local, I needed to change the port and the dataDir.
After that, excutes the:
./zkServer.sh start
Maybe some exceptions will appear, but it is because at least two zookeepers must be synchronized, when you start at least 2 zookeepers, exceptions should be gone.
Best regards.

Sending a PoisonPill to an Actor in Java

I am starting to learn Akka by migrating an existing Java SE app to it. I am using Akka 2.0.3.
At one point I need to send a PoisonPill through the message queue to stop the actors. My actor is instantiated thus:
ActorRef myActor = actorSystem.actorOf(new Props(MyActor.class), "myActor");
to which I try to send the PoisonPill:
myActor.tell(PoisonPill.getInstance());
But I get the following compiler error:
'tell(java.lang.Object)' in 'akka.actor.ActorRef' cannot be applied to '(akka.actor.PoisonPill$)'
What am I doing wrong? I'm running Java 1.6.0_26 in Idea (which I am also learning after a lifetime in Eclipse).
Edit:
I have also tried this approach, which is in the documentation, but I get the same compiler error and Idea warns me that the Actors class is deprecated.
import static akka.actor.Actors.*;
extractionActor.tell(poisonPill());
Please read the Akka documentation, we've spent a lot of time creating it:
PoisonPill
You can also send an actor the akka.actor.PoisonPill
message, which will stop the actor when the message is processed.
PoisonPill is enqueued as ordinary messages and will be handled after
messages that were already queued in the mailbox.
Use it like this:
import static akka.actor.Actors.*;
myActor.tell(poisonPill());
http://doc.akka.io/docs/akka/2.0.3/java/untyped-actors.html#PoisonPill
The above approach has been deprecated since 2.0.2, this is the new API:
ActorRef ref = system.actorOf(new Props(JavaAPITestActor.class));
ref.tell(PoisonPill.getInstance());
The above compiles on my machine so you might have some issue in IDEA? Try to compile it with javac and see if that works.
As mentioned in my reply to the comment above, this does not work in Idea or when using gradle to compile. It is in fact a compilation error since the sender ActorRef is required. I know the previous answers are old, and i'm not sure if this was a change in the api, so for anyone having a similar issue you should be using :
target.tell(PoisonPill.getInstance(), ActorRef.noSender());
For reference see : http://doc.akka.io/docs/akka/snapshot/java/lambda-actors.html#PoisonPill
UPDATE FROM 25.03.2019
The good answers from #Viktor Klang and #yash.vyas are a bit out of date. Here is the current working syntax of Scala 2.12.8 and JDK8 (1.8.0_172):
val accountB = context.actorOf(Props[BankAccount], "accountB")
accountB ! PoisonPill
You could also write:
...
accountB ! PoisonPill.getInstance
The default call of the tell-Method is also working:
...
accountB.tell(PoisonPill.getInstance,ActorRef.noSender)

Axis2 getSOAPEnvelope() performance issue

Using axis2 on solaris I've noticed that the message.getSOAPEnvelope() call is maxing out the server processing to 0.0 idle. The call takes about 10 seconds and then processing load goes back to normal. This is crazy for a single method especially something built into Axis.
Can anyone suggest a solution to this, I've not been able to find anything similar online.
// get message for sending
Message message = getSOAPMessage();
...
message=signSOAPEnvelope(message.getSOAPEnvelope()); //problem
...
SOAPEnvelope retMsg = (SOAPEnvelope) call.invoke(message.getSOAPEnvelope()); //problem
--ADDITIONAL INFORMATION---
Ok so the issue lies in teh SAXParser.parse() method called by axis (not axis2 btw). So I've done some further tests with other messages.
My application builds the SoapEnvelope and the message body is added to it. I've taken a message from another application that I know is working and following the soap envelope build I've overridden the message string with this older xml. So the SoapEnvelope is identical in both cases, however the xml I took from the other project works well whilst my new xml doesn't. The crazy thing is the older xml is larger so should take longer. Below are the examples of the relevant xml as I can't work out why one should work and the other not.
WORKS OK: large older xml
<ns2:applicationDetailSearchQuery
xmlns:ns2="http://www.company.com.au/wib/ID/schema/query"
xmlns:ns3="http://www.company.com.au/wib/Counterparty/schema/query"
xmlns:tns="http://www.company.com.au/wib/icc/schema/query">
<tns:queryID scheme="http://www.company.com.au/treasury/idbb/queryid">44051</tns:queryID>
<tns:queryType>ApplicationDetailSearch</tns:queryType>
<tns:pageSize>10000</tns:pageSize>
<ns2:parameters>
<ns2:tradeIdList>
<ns2:tradeId>111111</ns2:tradeId>
</ns2:tradeIdList>
<ns2:queryByHeadDealId>N</ns2:queryByHeadDealId>
<ns2:retrieveSchedule>N</ns2:retrieveSchedule>
<ns2:retrieveCashFlowDeals>Y</ns2:retrieveCashFlowDeals>
<ns2:dealType>BOND</ns2:dealType>
</ns2:parameters>
</ns2:applicationDetailSearchQuery>
REALLY SLOW: small xml???
<ns5:querySetRequest setId="1" xmlns:ns2="http://schemas.company.com.au/ttt/icc/common/header-V2-0" xmlns:ns4="http://schemas.company.com.au/ttt/icc/Services/FXC/TradeEnquiryServiceEnvelope" xmlns:ns3="http://schemas.company.com.au/ttt/icc/common/envelopemsg-V2-0" xmlns:ns5="http://webservice.common.ttt/queryservice/types">
<ns5:query queryName="RemainingBalanceQuery" queryID="1">
<ns5:parameter value="FWD:169805" type="String" name="KondorId"/>
<ns5:parameter value="0.9592" type="Decimal" name="ExchgRate"/>
<ns5:parameter value="USD" type="String" name="CurrencyCode"/>
<ns5:parameter value="09/08/2011" type="String" name="MatDate"/>
</ns5:query>
</ns5:querySetRequest>
Any ideas what might be causing excessive cpu for this second set of xml?
This was an issue with excessive logging from the SAXParser. When I set logging to warn for the relevent packages it ran in milliseconds. Crazy stuff!

Categories

Resources