Issues with Spring Integration and process taking time and pausing

Issues with Spring Integration and process taking time and pausing - java

I am looking at some issue that we have in our application. Spring integration is being used to poll a particular directory and then process the files in this directory. It can process 5k 1kb files and sometimes there is a huge pause where the application is doing nothing just sitting idle and then completes the process in 4 minutes. Then the next run will take a bit longer and the one after that takes slightly longer and so on until i restart the application where it goes back to the 4 minutes mark. Has anyone experienced this issue before.
I wrote a standalone version without Spring Integration and dont get the same issue.
I have also below pasted the xml config, just incase i have done something wrong that I can't spot.
Thanks in advance.
<!-- Poll the input file directory for new files. If found, send a Java File object on inputFileChannel -->
<file:inbound-channel-adapter directory="file:${filepath}"
channel="inputFileChannel" filename-regex=".+-OK.xml">
<si:poller fixed-rate="5000" max-messages-per-poll="1" />
</file:inbound-channel-adapter>
<si:channel id="inputFileChannel" />
<!-- Call processFile() and start parsing the XML inside the File -->
<si:service-activator input-channel="inputFileChannel"
method="splitFile" ref="splitFileService">
</si:service-activator>
<!-- Poll the input file directory for new files. If found, send a Java File object on inputFileChannel -->
<file:inbound-channel-adapter directory="file:${direcotrypath}" channel="inputFileRecordChannel" filename-regex=".+-OK.xml">
<si:poller fixed-rate="5000" max-messages-per-poll="250" task-executor="executor" />
</file:inbound-channel-adapter>
<task:executor id="executor" pool-size="8"
queue-capacity="0"
rejection-policy="DISCARD"/>
<si:channel id="inputFileRecordChannel" />
<!-- Call processFile() and start parsing the XML inside the File -->
<si:service-activator input-channel="inputFileRecordChannel"
method="processFile" ref="processedFileService">
</si:service-activator>
<si:channel id="wsRequestsChannel"/>
<!-- Sends messages from wsRequestsChannel to the httpSender, and returns the responses on
wsResponsesChannel. This is used once for each record found in the input file. -->
<int-ws:outbound-gateway uri="#{'http://localhost:'+interfaceService.getWebServiceInternalInterface().getIpPort()+'/ws'}"
message-sender="httpSender"
request-channel="wsRequestsChannel" reply-channel="wsResponsesChannel" mapped-request-headers="soap-header"/>
<!-- Handles the responses from the web service (wsResponsesChannel). Again
this is used once for each response from the web service -->
<si:service-activator input-channel="wsResponsesChannel"
method="handleResponse" ref="responseProcessedFileService">
</si:service-activator>

As I surmised in the comment to your question, the (default) AcceptOnceFileListFilter does not scale well for a large number of files because it performs a linear search over the previously processed files.
We can make some improvements there; I opened a JIRA Issue for that.
However, if you don't need the semantics of that filter (i.e. your flow removes the input file on completion), you can replace it with another filter, such as an AcceptAllFileListFilter.
If you need accept once semantics you will need a more efficient implementation for such a large number of files. But I would warn that when using such a large number of files, if you don't remove them after processing, things are going to slow down anyway, regardless of the filter.

Related

How to limit number of files to be processed in a mule flow?

I have the following code:
<sftp:connector name="ImportStatusUpdateSFTP" validateConnections="true" doc:name="SFTP"/>
<flow name="UpdateFlow1" doc:name="UpdateFlow1">
<sftp:inbound-endpoint sizeCheckWaitTime="${sftpconnector.sizeCheckWaitTime}" connector-ref="ImportStatusUpdateSFTP" host="${sftp.host}" port="${sftp.port}"
path="${sftp.path}" user="${sftp.user}" password="${sftp.password}" responseTimeout="${sftp.responseTimeout}"
archiveDir="${mule.servicefld}${sftp.archiveDir}" archiveTempReceivingDir="${sftpconnector.archiveTempReceivingDir}" archiveTempSendingDir="${sftpconnector.archiveTempSendingDir}"
tempDir="${sftp.tempDir}" doc:name="SFTP" pollingFrequency="${sftp.poll.frequency}">
<file:filename-wildcard-filter pattern="*.xml"/>
</sftp:inbound-endpoint>
<idempotent-message-filter idExpression="#[headers:originalFilename]"
throwOnUnaccepted="true" storePrefix="Idempotent_Message" doc:name="Idempotent Message"
doc:description="Check for processing the same file again.">
<simple-text-file-store name="FTP_files_names"
maxEntries="1000" entryTTL="-1" expirationInterval="3600"
directory="${mule.servicefld}${idempotent.fileDir}" />
</idempotent-message-filter>
<object-to-byte-array-transformer doc:name="Object to Byte Array"/>
<message-filter onUnaccepted="Status_UpdateFlow_XML_Validation_Failed">
<mulexml:schema-validation-filter schemaLocations="xsd/StatusUpdate.xsd" returnResult="false" doc:name="Schema_Validation"/>
</message-filter>
<vm:outbound-endpoint exchange-pattern="one-way"
path="StatusUpdateIN" doc:name="StatusUpdateVMO" />
<default-exception-strategy>
<vm:outbound-endpoint path="serviceExceptionHandlingFlow" />
</default-exception-strategy>
</flow>
My problem is, if there are lots of files on the SFTP (1000), it takes them all, converts them, validates them, and then sends them to the outbound-endpoint, and this puts a strain on the Application Processing Part.
Is there a way to limit, split, batch, filter or any other kind of action that will send only a max amount of messages / files to the outbound endpoint.

In Mule 3 there is no built-in generic method to do this. There may be possible solutions in a case by case basis. In Mule 4 there is a simple way using the maxConcurrency attribute in flows.

In Mule extension using Mule 4.1.4., Is there a way to get information about the next operation in my operation?

I'm creating a Mule extension using Mule 4.1.4..
Is there a way to get information about the next operation in my operation?
For example, in the following definition, I want to know that the next operation is vm:publish.
<foo:my-operation />
<vm:publish queueName="myQueue">
<vm:content>#[payload.body]</vm:content>
</vm:publish>
And, in the following definition, I want to know that the next operation is http:request.
<foo:my-operation />
<async>
<http:request method="GET" path="greet" config-ref="clientConfig"/>
</async>
I can get my operation information using ComponentLocation, but I don't know how to get the next operation information.

I don't think that is possible or should be attempted. I guess it goes against the expected usage of components in flows in Mule. The expectation of a flow is that you can change components in a flow as long as they agree on the data exchanged between them. This would introduce a coupling between components.
It would be better by making message processors for each use case and it probably it is easier.
Example:
<foo:my-operation-formatA />
<vm:publish queueName="myQueue">
<vm:content>#[payload.body]</vm:content>
</vm:publish>
<foo:my-operation-formatB />
<async>
<http:request method="GET" path="greet" config-ref="clientConfig"/>
</async>

How to pass byte[] to input channel - file-to-bytes-transformer?

my spring-boot-integration app, could be running on multiple servers(nodes) but they all are supposed to read a common directory.Now, I wrote a custom locker which takes the lock on file so that any other instance will not be able to process the same file . All spring configuration have been done in xml.
Application acquiring the lock but unable to read the content of locked file.
java.io.IOException: The process cannot access the file because another process has locked a portion of the file
as suggested in forms, we can get access to the locked file content only over ByteBuffer.
so tried to transform the file to bytes using file-to-bytes-transformer and passed as input to outbound gateway. But instance not getting started.
Any suggestion?
<file:file-to-bytes-transformer input-channel="filesOut" output-channel="filesOutChain"/>
<integration:chain id="filesOutChain" input-channel="filesOutChain">
<file:outbound-gateway id="fileMover"
auto-create-directory="true"
directory-expression="headers.TARGET_PATH"
mode="REPLACE">
<file:request-handler-advice-chain>
<ref bean="retryAdvice" />
</file:request-handler-advice-chain>
</file:outbound-gateway>
<integration:gateway request-channel="filesChainChannel" error-channel="errorChannel"/>
</integration:chain>

Spring Batch - Not all records are being processed from MQ retrieval

I am fairly new to Spring and Spring Batch, so feel free to ask any clarifying questions if you have any.
I am seeing an issue with Spring Batch that I cannot recreate in our test or local environments. We have a daily job that connects to Websphere MQ via JMS and retrieves a set of records. This job uses the out-of-the-box JMS ItemReader. We implement our own ItemProcessor, but it doesn't do anything special other than logging. There are no filters or processing that should affect incoming records.
The problem is that out of the 10,000+ daily records on MQ, only about 700 or so (the exact number is different each time) usually get logged in the ItemProcessor. All records are successfully pulled off the queue. The number of records logged is different each time and seems to have no pattern. By comparing the log files against the list of records in MQ, we can see that a seemingly random subset of records are being "processed" by our job. The first record might get picked up, then 50 are skipped, then 5 in a row, etc. And the pattern is different each time the job runs. No exceptions are logged either.
When running the same app in localhost and test using the same data set, all 10,000+ records are successfully retrieved and logged by the ItemProcessor. The job runs between 20 and 40 seconds in Production (also not constant), but in test and local it takes several minutes to complete (which obviously makes sense since it is handling so many more records).
So this is one of those tough issue to troubleshoot since we cannot recreate it. One idea is to implement our own ItemReader and add additional logging so that we can see if records are getting lost before the reader or after the reader - all we know now is that only a subset of records are being handled by the ItemProcessor. But even that will not solve our problem, and it will be somewhat timely to implement considering it is not even a solution.
Has anyone else seen an issue like this? Any possible ideas or troubleshooting suggestions would be greatly appreciated. Here are some of the jar version numbers we are using for reference.
Spring - 3.0.5.RELEASE
Spring Integration - 2.0.3.RELEASE
Spring Batch - 2.1.7.RELEASE
Active MQ - 5.4.2
Websphere MQ - 7.0.1
Thanks in advance for your input.
EDIT: Per request, code for processor:
public SMSReminderRow process(Message message) throws Exception {
SMSReminderRow retVal = new SMSReminderRow();
LOGGER.debug("Converting JMS Message to ClaimNotification");
ClaimNotification notification = createClaimNotificationFromMessage(message);
retVal.setShortCode(BatchCommonUtils
.parseShortCodeFromCorpEntCode(notification.getCorpEntCode()));
retVal.setUuid(UUID.randomUUID().toString());
retVal.setPhoneNumber(notification.getPhoneNumber());
retVal.setMessageType(EventCode.SMS_CLAIMS_NOTIFY.toString());
DCRContent content = tsContentHelper.getTSContent(Calendar
.getInstance().getTime(),
BatchCommonConstants.TS_TAG_CLAIMS_NOTIFY,
BatchCommonConstants.TS_TAG_SMSTEXT_TYP);
String claimsNotificationMessage = formatMessageToSend(content.getContent(),
notification.getCorpEntCode());
retVal.setMessageToSend(claimsNotificationMessage);
retVal.setDateTimeToSend(TimeUtils
.getGMTDateTimeStringForDate(new Date()));
LOGGER.debug(
"Finished processing claim notification for {}. Writing row to file.",
notification.getPhoneNumber());
return retVal;
}
JMS config:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:tx="http://www.springframework.org/schema/tx"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx.xsd">
<bean id="claimsQueueConnectionFactory" class="org.springframework.jndi.JndiObjectFactoryBean">
<property name="jndiName" value="jms/SMSClaimNotificationCF" />
<property name="lookupOnStartup" value="true" />
<property name="cache" value="true" />
<property name="proxyInterface" value="javax.jms.ConnectionFactory" />
</bean>
<bean id="jmsDestinationResolver"
class="org.springframework.jms.support.destination.DynamicDestinationResolver">
</bean>
<bean id="jmsJndiDestResolver"
class=" org.springframework.jms.support.destination.JndiDestinationResolver"/>
<bean id="claimsJmsTemplate" class="org.springframework.jms.core.JmsTemplate">
<property name="connectionFactory" ref="claimsQueueConnectionFactory" />
<property name="defaultDestinationName" value="jms/SMSClaimNotificationQueue" />
<property name="destinationResolver" ref="jmsJndiDestResolver" />
<property name="pubSubDomain">
<value>false</value>
</property>
<property name="receiveTimeout">
<value>20000</value>
</property>
</bean>

As a rule, MQ will NOT lose messages when properly configured. The question then is what does "properly configured" look like?
Generally, lost messages are caused by non-persistence or non-transactional GETs.
If non-persistent messages are traversing QMgr-to-QMgr channels and NPMSPEED(FAST) is set then MQ will not log errors if they are lost. That is what those options are intended to be used for so no error is expected.
Fix: Set NPMSPEED(NORMAL) on the QMgr-to-QMgr channel or make the messages persistent.
If the client is getting messages outside of syncpoint, messages can be lost. This is nothing to do with MQ specifically, it's just how messaging in general works. If you tell MQ to get a message destructively off the queue and it cannot deliver that message to the remote application then the only way for MQ to roll it back is if the message was retrieved under syncpoint.
Fix: Use a transacted session.
There are some additional notes, born out of experience.
Everyone swears message persistence is set to what they think it is. But when I stop the application and inspect the messages manually it very often is not what is expected. It's easy to verify so don't assume.
If a message is rolled back on the queue, it won't happen until MQ or TCP times out the orphan channel This can be up to 2 hours so tune the channel parms and TCP Keepalive to reduce that.
Check MQ's error logs (the ones at the QMgr not the client) to look for messages about transactions rolling back.
If you still cannot determine where the messages are going, try tracing with SupportPac MA0W. This trace runs as an exit and it is extremely configurable. You can trace all GET operations on a single queue and only that queue. The output is in human-readable form.

See http://activemq.apache.org/jmstemplate-gotchas.html .
There are issues using the JMSTemplate. I only ran into these issues when I upgraded my hardware and suddenly exposed a pre-existing race condition.
The short form is that by design and intent the JMS Template opens and closes the connection on every invocaton. It will not see messages older than its creation. In high volume and/or high throughput scenarios, it will fail to read some messages.

Camel File processing

I'm using Camel (2.11.0) to try and achieve the following functionality:
If a file exists at a certain location, copy it to another location and then begin processing it
If no such file exists, then I don't want the file consumer/poller to block; I just want processing to continue to a direct:cleanup route
I only want the file to be polled once!
Here's what I have so far (using Spring XML):
<camelContext id="my-camel-context" xmlns="http://camel.apache.org/schema/spring">
<route id="my-route
<from uri="file:///home/myUser/myApp/fizz?include=buzz_.*txt"/>
<choice>
<when>
<!-- If the body is empty/NULL, then there was no file. Send to cleanup route. -->
<simple>${body} == null</simple>
<to uri="direct:cleanup" />
</when>
<otherwise>
<!-- Otherwise we have a file. Copy it to the parent directory, and then continue processing. -->
<to uri="file:///home/myUser/myApp" />
</otherwise>
</choice>
<!-- We should only get here if a file existed and we've already copied it to the parent directory. -->
<to uri="bean:shouldOnlyGetHereIfFileExists?method=doSomething" />
</route>
<!--
Other routes defined down here, including one with a "direct:cleanup" endpoint.
-->
</camelContext>
With the above configuration, if there is no file at /home/myUser/myApp/fizz, then Camel just waits/blocks until there is one. Instead, I want it to just give up and move on to direct:cleanup.
And if there is a file, I see it getting processed inside the shouldOnlyGetHereIfFileExists bean, but I do not see it getting copied to /home/myUser/myApp; so it's almost as if the <otherwise> element is being skipped/ignored altogether!
Any ideas? Thanks in advance!

Try this setting, and tune your polling interval to suit:
From Camel File Component docs:
sendEmptyMessageWhenIdle
default =false
Camel 2.9: If the polling consumer did not poll any files, you can enable this option to send an empty message (no body) instead.
Regarding writing the file, add a log statement inside the <otherwise> to ensure it's being executed. If so, check file / folder permissions, etc.
Good luck.

One error i faced while I tried using the condition:
<simple>${body} != null</simple>
was it always returns true.
Please go through the below link:
http://camel.465427.n5.nabble.com/choice-when-check-BodyType-null-Body-null-td4259599.html
It may help you.

This is very old, but if anyone finds this, you can poll only once with
"?repeatCount=1"

I know the question was done almost 4 years ago but I had exaclty the same problem yesterday.
So I will let my answer here, maybe it will help another person.
I am using Camel, version 3.10.0
To make it work exactly as described in the question:
If a file exists at a certain location, copy it to another location and then begin processing it
If no such file exists, then I don't want the file consumer/poller to block; I just want processing to continue to a direct:cleanup route
ONLY want the file to be polled once!
Using ${body} == null
The configuration that we need are:
sendEmptyMessageWhenIdle=true //Will send a empty body when Idle
maxMessagesPerPoll=1 //Max files that it will take at once
repeatCount=1 //How many times will execute the Pool (above)
greedy=true // If the last pool executed with files, it will
execute one more time
XML:
<camel:endpoint id="" uri="file:DIRECTORY?sendEmptyMessageWhenIdle=true&initialDelay=100&maxMessagesPerPoll=1&repeatCount=1&greedy=false" />

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.