I have a requirement to process the big file using these 2 case
(1) Pull from file directory and push to FTP
(2) Pull from one FTP and push to another FTP
I am creating a java maven dependency project which use camel components to process above use-case of file transfer, so I decided to use org.apache.camel.main.Main class to start my route, but the problem is my program is not getting closed even after file is processed successfully. Somewhere I read that using "System.exit()" would solve the problem but still problem exists.
My code-
Main camelMain = new Main();
camelMain.enableHangupSupport();
camelMain.addRouteBuilder(getRouteBuilderLocaltoFTP());
camelMain.run();
RouteBuilder
public void configure() throws Exception {
from(<File-Path>).routeId("local-to-ftp")
.onCompletion().process(new Processor() {
#Override
public void process(Exchange exchange) throws Exception {
exchange.getContext().stop();
}
})
.toD(<FTP-Path>);
Also tried using control-Bus
.toF("controlbus:route?routeId=%s&action=stop&async=true", "local-to-ftp")
But in both the cases, routes are shutting down gracefully not closing the program.
Seek help in this.
You can configure on the main to shutdown the application after X period of time, or after processing N number of messages. And you can set both of them AFAIR, eg shutdown after 2 minutes, and after 1 message processed.
Just check the methods on camelMain. Mind that this requires a recent version of Apache Camel.
Various ways of Graceful Shutdown is documented here. You can try this code to shutdown a particular route
camelContext.getRouteController().stopRoute("routeId");
camelContext.removeRoute("routeId");
I'm trying to execute an application under (reasonable) load. What is happening under load is that when trying to place a message onto a queue, the application stalls for about 4 seconds before completing the send. The strange part is that immediately after doing this, the next message takes a matter of milliseconds to place onto the queue. The message is in fact the same message - so the message size isn't a factor.
The application is using Spring Boot 2.1.6, Apache Qpid 0.43.0 as the JMS/AMQP provider.
The message bus being used is Azure ServiceBus, but I have observed the same behaviour using Artemis.
On the Apache Qpid JmsConnectionFactory, I've tried fiddling with the properties "forceSyncSend".
I've tried using the Spring Boot CachingConnectionFactory to cache message producers only. I have increased the default cache size from 1 to 20 without any success.
I've looked at the JmsTemplate parameters but can't find any parameters in regard to message producers (plenty with listeners but that's another story).
The code doing the sending is quite simple:
private void sendToQueue(Object message, String queueName) {
jmsTemplate.convertAndSend(queueName, message, (Message jmsMessage) -> {
jmsMessage.setStringProperty(OBJECT_TYPE_PARAMETER, message.getClass().getSimpleName());
return jmsMessage;
});
Is there anything obvious to try? Are there any tuning parameters to stop this stalling happening?
The load on the system is not trivial, but it is not excessive (it needs to go a lot higher than where it is at the moment!)
Any ideas?
I have a route defined in Camel that goes something like this: GET request comes in, a file gets created in the file system. File consumer picks it up, fetches data from external web services, and sends the resulting message by POST to other web services.
Simplified code below:
// Update request goes on queue:
from("restlet:http://localhost:9191/update?restletMethod=post")
.routeId("Update via POST")
[...some magic that defines a directory and file name based on request headers...]
.to("file://cameldest/queue?allowNullBody=true&fileExist=Ignore")
// Update gets processed
from("file://cameldest/queue?delay=500&recursive=true&maxDepth=2&sortBy=file:parent;file:modified&preMove=inprogress&delete=true")
.routeId("Update main route")
.streamCaching() //otherwise stuff can't be sent to multiple endpoints
[...enrich message from some web service using http4 component...]
.multicast()
.stopOnException()
.to("direct:sendUpdate", "direct:dependencyCheck", "direct:saveXML")
.end();
The three endpoints in the multicast are simply POSTing the resulting message to other web services.
This all works rather well when the queue (i.e. the file directory cameldest) is fairly empty. Files are being created in cameldest/<subdir>, picked up by the file consumer and moved into cameldest/<subdir>/inprogress, and stuff is being sent to the three outgoing POSTs no problem.
However, once the incoming requests pile up to about 300,000 files progress slows down and eventually the pipeline fails due to out-of-memory errors (GC overhead limit exceeded).
By increasing logging I can see that the file consumer polling basically never runs, because it appears to take responsibility for all files it sees at each time, waits for them to be done processing, and only then starts another poll round. Besides (I assume) causing the resources bottleneck, this also interferes with my sorting requirements: Once the queue is jammed with thousands of messages waiting to be processed, new messages that would naively be sorted higher up are -if they even still get picked up- still waiting behind those that are already "started".
Now, I've tried the maxMessagesPerPoll and eagerMaxMessagesPerPoll options. They seem to alleviate the problem at first, but after a number of poll rounds I still end up with thousands of files in "started" limbo.
The only thing that sort of worked was making the bottle neck of delay and maxMessages... so narrow that the processing on average would finish faster than the file polling cycle.
Clearly, that is not what I want. I would like my pipeline to process files as fast as possible, but not faster. I was expecting the file consumer to wait when the route is busy.
Am I making an obvious mistake?
(I'm running a somewhat older Camel 2.14.0 on a Redhat 7 machine with XFS, if that is part of the problem.)
Try set maxMessagesPerPoll to a low value on the from file endpoint to only pickup at most X files per poll which also limits the total number of inflight messages you will have in your Camel application.
You can find more information about that option in the Camel documentation for the file component
The short answer is that there is no answer: The sortBy option of Camel's file component is simply too memory-inefficient to accomodate my use-case:
Uniqueness: I don't want to put a file on queue if it's already there.
Priority: Files flagged as high priority should be processed first.
Performance: Having a few hundred thousands of files, or maybe even a few million, should be no problem.
FIFO: (Bonus) Oldest files (by priority) should be picked up first.
The problem appears to be, if I read the source code and the documentation correctly, that all file details are in memory to perform the sorting, no matter whether the built-in language or a custom pluggable sorter is used. The file component always creates a list of objects containing all details, and that apparently causes an insane amount of garbage collection overhead when polling many files often.
I got my use case to work, mostly, without having to resort to using a database or writing a custom component, using the following steps:
Move from one file consumer on the parent directory cameldest/queue that sorts recursively the files in the subdirectories (cameldest/queue/high/ before cameldest/queue/low/) to two consumers, one for each directory, with no sorting at all.
Set up only the consumer from /cameldest/queue/high/ to process files through my actual business logic.
Set up the consumer from /cameldest/queue/low to simply promote files from "low" to "high" (copying them over, i.e. .to("file://cameldest/queue/high");)
Crucially, in order to only promote from "low" to "high" when high is not busy, attach a route policy to "high" that throttles the other route, i.e. "low" if there are any messages in-flight in "high"
Additionally, I added a ThrottlingInflightRoutePolicy to "high" to prevent it from inflighting too many exchanges at once.
Imagine this like at check-in at the airport, where tourist travellers are invited over into the business class lane if that is empty.
This worked like a charm under load, and even while hundreds of thousands of files were on queue in "low", new messages (files) dropped directly into "high" got processed within seconds.
The only requirement that this solution doesn't cover, is the orderedness: There is no guarantee that older files are picked up first, rather they are picked up randomly. One could imagine a situation where a steady stream of incoming files could result in one particular file X just always being unlucky and never being picked up. The chance of that happening, though, is very low.
Possible improvement: Currently the threshold for allowing / suspending the promotion of files from "low" to "high" is set to 0 messages inflight in "high". On the one hand, this guarantees that files dropped into "high" will be processed before another promotion from "low" is performed, on the other hand it leads to a bit of a stop-start-pattern, especially in a multi-threaded scenario. Not a real problem though, the performance as-is was impressive.
Source:
My route definitions:
ThrottlingInflightRoutePolicy trp = new ThrottlingInflightRoutePolicy();
trp.setMaxInflightExchanges(50);
SuspendOtherRoutePolicy sorp = new SuspendOtherRoutePolicy("lowPriority");
from("file://cameldest/queue/low?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
.routeId("lowPriority")
.log("Copying over to high priority: ${in.headers."+Exchange.FILE_PATH+"}")
.to("file://cameldest/queue/high");
from("file://cameldest/queue/high?delay=500&maxMessagesPerPoll=25&preMove=inprogress&delete=true")
.routeId("highPriority")
.routePolicy(trp)
.routePolicy(sorp)
.threads(20)
.log("Before: ${in.headers."+Exchange.FILE_PATH+"}")
.delay(2000) // This is where business logic would happen
.log("After: ${in.headers."+Exchange.FILE_PATH+"}")
.stop();
My SuspendOtherRoutePolicy, loosely built like ThrottlingInflightRoutePolicy
public class SuspendOtherRoutePolicy extends RoutePolicySupport implements CamelContextAware {
private CamelContext camelContext;
private final Lock lock = new ReentrantLock();
private String otherRouteId;
public SuspendOtherRoutePolicy(String otherRouteId) {
super();
this.otherRouteId = otherRouteId;
}
#Override
public CamelContext getCamelContext() {
return camelContext;
}
#Override
public void onStart(Route route) {
super.onStart(route);
if (camelContext.getRoute(otherRouteId) == null) {
throw new IllegalArgumentException("There is no route with the id '" + otherRouteId + "'");
}
}
#Override
public void setCamelContext(CamelContext context) {
camelContext = context;
}
#Override
public void onExchangeDone(Route route, Exchange exchange) {
//log.info("Exchange done on route " + route);
Route otherRoute = camelContext.getRoute(otherRouteId);
//log.info("Other route: " + otherRoute);
throttle(route, otherRoute, exchange);
}
protected void throttle(Route route, Route otherRoute, Exchange exchange) {
// this works the best when this logic is executed when the exchange is done
Consumer consumer = otherRoute.getConsumer();
int size = getSize(route, exchange);
boolean stop = size > 0;
if (stop) {
try {
lock.lock();
stopConsumer(size, consumer);
} catch (Exception e) {
handleException(e);
} finally {
lock.unlock();
}
}
// reload size in case a race condition with too many at once being invoked
// so we need to ensure that we read the most current size and start the consumer if we are already to low
size = getSize(route, exchange);
boolean start = size == 0;
if (start) {
try {
lock.lock();
startConsumer(size, consumer);
} catch (Exception e) {
handleException(e);
} finally {
lock.unlock();
}
}
}
private int getSize(Route route, Exchange exchange) {
return exchange.getContext().getInflightRepository().size(route.getId());
}
private void startConsumer(int size, Consumer consumer) throws Exception {
boolean started = super.startConsumer(consumer);
if (started) {
log.info("Resuming the other consumer " + consumer);
}
}
private void stopConsumer(int size, Consumer consumer) throws Exception {
boolean stopped = super.stopConsumer(consumer);
if (stopped) {
log.info("Suspending the other consumer " + consumer);
}
}
}
I would propose an alternative solution unless you really need to save the data as files.
From your restlet consumer, send each request to a message queuing app such as activemq or rabbitmq or something similar. You will quickly end up with lots of messages on that queue but that is ok.
Then replace your file consumer with a queue consumer. It will take some time but the each message should be processed separately and sent to wherever you want. I have tested rabbitmq with about 500 000 messages and that has worked fine. This should reduce the load on the consumer as well.
On JConsole, We can see following route Statistics.
Minimum / Maximum / Mean Processing Time
First / last Message completion Time
Number of messages failed or re-delivered.
Total number of transaction processed
Requirement: I need to show above data on web page.
Below is my code:
public void process(Exchange exchange) throws Exception {
CamelContext context = exchange.getContext();
List<Route> routeObj = context.getRoutes();
for (Route routeId : routeObj) {
boolean started = context.getRouteStatus(strRouteId).isStarted();
boolean stopped = context.getRouteStatus(strRouteId).isStopped();
boolean suspended = context.getRouteStatus(strRouteId).isSuspended();
// TODO: find min/max/mean processing time, first/last message
// completion time, etc.
}
}
Thanks in advance.
Please suggest me how to get min/max/mean processing time, first/last message completion time, etc.
See for example the Camel Karaf commands that can dump statistics too. They use the JMX API to do that.
An example is the context-info command: https://github.com/apache/camel/blob/master/platforms/karaf/commands/src/main/java/org/apache/camel/karaf/commands/ContextInfo.java
Apache camel exposes these information using JMX.
A good starting point is the official JMX tutorial and the Apache Camel JMX Documentation
You can actually calculate the info you require, using org.apache.camel.management.PublishEventNotifier
One type of events will get notified of is concerning camel exchanges (like completion, failure...) of each route. The only piece of information you need after that is the processing time of a this exchange (last exchange) which is obtainable using JMX (LastProcessingTime).
Once you have the exchanges processing time for each route, all the information you require can be calculated in real-time.
We are using Camel fluent builders to set up a series of complex routes, in which we are using dynamic routing using the RecipientList functionality.
We've encountered issues where in some cases, the recipient list contains a messaging endpoint that doesn't exist (for example, something like seda:notThere).
A simple example is something like this:
from("seda:SomeSource")....to("seda:notThere");
How can I configure the route so that if the exchange tries to route to an endpoint that doesn't already exist, an error is thrown?
I'm using Camel 2.9.x, and I've already experimented with the Dead Letter Channel and various Error Handler implementations, with (seemingly) no errors or warnings logged.
The only logging I see indicates that Camel is (attempting to) send to the endpoint which doesn't exist:
2013-07-03 16:07:08,030|main|DEBUG|o.a.c.p.SendProcessor|>>>> Endpoint[seda://notThere] Exchange[Message: x.y.Z#293b9fae]
Thanks in advance!
All endpoints behave differently in this case.
If you attempt to write to a ftp server that does not exist, you certainly get an error (connection refused or otherwise)..
This is also true for a number of endpoints.
SEDA queues gets created if the do not exist and the message will be left there. So your route actually sends to "notThere" and the message will still be there until the application restarts or someone starts to consume messages from seda:notThere. This is the way seda queues are designed. If you set the size of the seda queue by to("seda:notThere?size=100"), then if there is noone reading (or reading slowly) you will get exceptions on message 101 and forward.
If you need to be sure some route is consuming your messages, use "direct" instead of "seda". You can even have some middle layer to use the features of seda with respect to staging and the features of direct knowing there is a consumer active (if sent from recipient list with perhaps user input (god forbid).
from("whatever").recipentList( ... ); // "direct:ep1" work, "direct:ep2" throws exception
from("direct:ep1").to("seda:ep1");
from("seda:ep1").doRealStagedStuffHere();