Apache Camel: Cached stream file deletion causing file not found errors - java

Scenario:
I am trying to stream and process some large xml files. These files are send from a producer asynchronously.
producerTemplate.sendBodyAndHeaders(endpointUri, inStream, ImmutableMap.of(JOBID_PROPERTY, importJob.getId()));
I need to batch all file input streams, identify the files by probing them with xpath and reorder them according to their content. I have the following route:
from("direct:route1")
.streamCaching()
.choice()
.when(xpath("//Tag1")) .setHeader("execOrder", constant(3)) .setHeader("xmlRoute", constant( "direct:some-route"))
.when(xpath("//Tag2")) .setHeader("execOrder", constant(1)) .setHeader("xmlRoute", constant( "direct:some-other-route"))
.when(xpath("//Tag3")) .setHeader("execOrder", constant(2)) .setHeader("xmlRoute", constant( "direct:yet-another-route"))
.otherwise()
.to("direct:somewhereelse")
.end()
.resequence(header("execOrder"))
.batch(new BatchResequencerConfig(300, 10000L))
.allowDuplicates()
.recipientList(header("xmlRoute"))
When running my code I get the following error:
2017-11-23 11:43:13.442 INFO 10267 --- [ - Batch Sender] c.w.n.s.m.DefaultImportJobService : Updating entity ImportJob with id 5a16a61803af33281b22c716
2017-11-23 11:43:13.451 WARN 10267 --- [ - Batch Sender] org.apache.camel.processor.Resequencer : Error processing aggregated exchange: Exchange[ID-int-0-142-bcd-wsint-pro-59594-1511433568520-0-20]. Caused by: [org.apache.camel.RuntimeCamelException - Cannot reset stream from file /var/folders/dc/fkrgdrnx6txbg7jfdjd_58mm0000gn/T/camel/camel-tmp-39abaae8-9bdd-435a-b63d-299ad8b06415/cos1499080503439465502.tmp]
org.apache.camel.RuntimeCamelException: Cannot reset stream from file /var/folders/dc/fkrgdrnx6txbg7jfdjd_58mm0000gn/T/camel/camel-tmp-39abaae8-9bdd-435a-b63d-299ad8b06415/cos1499080503439465502.tmp
at org.apache.camel.converter.stream.FileInputStreamCache.reset(FileInputStreamCache.java:91)
I've read here that the FileInputStreamCache is closed when the XPathBuilder.getDocument() is called, and the temp file is deleted, so you get the FileNotFoundException when the XPathBuilder wants to reset the InputStream
The solution seems to be to disable the spooling to disk like that:
camelContext.getStreamCachingStrategy().setSpoolThreshold(-1);
However, I don't want to do that because of RAM restrictions, i.e. files can get up to 600MB and I don't want to keep them in memory. Any ideas how to solve the problem?

The resequencer is a two-leg pattern (stateful) and will cause the original exchange to be done beforehand, as its keeping a copy in memory while re-sequencing until the gap is fulfilled and sending the messages out in the new order.
Since your input stream comes from some HTTP service then that would be closed beforehand the resequencer may output the exchange.
Either do as suggested to store to local disk first, and then let the resequencer work on that, or find a way not to use the resequencer.

I ended up doing what Claus and Ricardo suggested. I made a separate route which saves the files to disk. Then another one which probes the files and resequences the exchanges according to a fixed order.
String xmlUploadDirectory = "file://" + Files.createTempDir().path + "/xmls?noop=true"
from("direct:route1")
.to(xmlUploadDirectory)
from(xmlUploadDirectory)
.choice()
.when(xpath("//Tag1")).setHeader("execOrder", constant(3)).setHeader("xmlRoute", constant( "direct:some-route"))
.when(xpath("//Tag2")).setHeader("execOrder", constant(1)).setHeader("xmlRoute", constant( "direct:some-other-route"))
.when(xpath("//Tag3")).setHeader("execOrder", constant(2)).setHeader("xmlRoute", constant( "direct:yet-another-route"))
.otherwise()
.to("direct:somewhereelse")
.end()
.to("direct:resequencing")
from("direct:resequencing")
.resequence(header("execOrder"))
.batch(new BatchResequencerConfig(300, 10000L))
.allowDuplicates()
.recipientList(header("xmlRoute"))

Related

Spring Outbound SFTP integration streaming

We are developing a spring batch application which is going to process "big" files in the future. To maintain a low memory signature we use spring batch on the smallest possible chunks of these files.
After processing, we want to write a result back to SFTP, which also happens per chunk of the input file.
The current approach is as follows:
StepExecutionListener.before(): we send a message to the SftpOutboundAdapter with FileExistsMode.REPLACE and empty payload to create an empty file (with .writing)
Reader: will read the input file
Processor: will enhance the input with the results and return a list of string
Writer: will send the list of strings to another SftpOutboundAdapter with FileExistsMode.APPEND
StepExecutionListener.after(): In case the execution as successful we will rename the file to remove the .writing suffix.
Now I saw that there are Streaming Inbound Adapters but I could not find Streaming Outbound Adapters.
Is this really the only/best way to solve it by append? Or is it possible to stream the file content?

Apache Camel downloads some files incompletely from SFTP

I've been struggling to get to the bottom of why it is that some files are not correctly downloaded.
It seems like certain files just won't be downloaded fully, even when testing locally and restarting my application.
To make matters more difficult it is not always consistent.
Info:
Apache Camel version: 2.20.0
Integrated into Spring-Boot application using the camel-spring-boot-starter
Files are about 190M
Files download ok using standalone Jsch and Linux sftp client
Heap size set to 1G and memory usage doesn't even get close to the max
Camel doesn't detect anything wrong with the download, even if number of bytes written is tens of megabytes less than the length of the file according to camel headers (camel headers have correct file length)
I've observed the issue with org.apache.camel logging set to TRACE without seeing anything strange in the logs.
Idemoptent repo is updated as if the file was processed correctly
I see the same issue on Linux and Windows
Any advise on what the issue might be or suggestions for how to troubleshoot would be awesome!
Route config (a bit artificially created since values come from spring-boot config):
public class FileRouteBuilder extends RouteBuilder {
// Cut
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("seda:"+ROUTE_ID_ERROR_EMAIL));
from("sftp://username#hostname/OUT?noop=true&streamDownload=true&password=password&include=Data_file.*csv&idempotentRepository=#keyRepo&greedy=true&delay=5m&maxMessagesPerPoll=10&readLock=changed")
.id(routeConfig.getRouteId())
.routeDescription(routeConfig.getRouteId())
.setHeader(HEADER_FILE_SOURCE, constant(routeConfig.getRouteId()))
.to("log:feeds." + routeConfig.getRouteId() + "?level=INFO&showAll=true")
// Exclude all files oder than the specified number of hours
.filter(new FileModifiedSincePredicate(24))
.to(file:rootDir/DATA)
.to("seda:" + ROUTE_ID_ACTIVITY_EMAIL_NOTIFICATION)
.end();
}
}
}
Update1
Observations after adding binary=true.
First two files are downloaded correctly but the 3rd and final file on the server is not.
193255587 Data_File_12.csv
191072548 Data_File_15.csv
139929360 Data_File_16.csv
The correct file size of teh Data_FIle_16.csv file is 192867682 bytes, which is captured correctly in the the CamelFileLength header.
Update 2
Removed all the log and seda email components above, and re-ran.
The third file still doesn't get completely written.
Adding the relevant DEBUG level log output in the hope that it sheds some light on what is going on or perhaps rules out certain things.
From what I can tell the log doesn't show anything suspicious and there is not hint that the _16 file is incompletely written.
Is there anything which could be happening on the SFTP server that anyone is aware of that it is worth checking with the provider?
o.a.c.c.file.remote.SftpConsumer : Took 0.194 seconds to poll: OUT
o.a.c.c.file.remote.SftpConsumer : Total 3 files to consume
o.a.c.c.file.remote.SftpConsumer : About to process file: RemoteFile[Data_File_12.csv] using exchange: Exchange[]
o.apache.camel.processor.SendProcessor : >>>> file://target/file-dest/MISA Exchange[ID-LON-2016-1516204084378-0-1]
o.a.camel.component.file.FileOperations : Using InputStream to write file: target\file-dest\MISA\Data_File_12.csv
o.a.camel.converter.jaxp.XmlConverter : Created TransformerFactory: com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl#d9dfe93
o.a.c.c.file.GenericFileProducer : Wrote [target\file-dest\MISA\Data_File_12.csv] to [file://target/file-dest/MISA]
o.a.c.c.file.GenericFileOnCompletion : Done processing file: RemoteFile[Data_File_12.csv] using exchange: Exchange[ID-LON-2016-1516204084378-0-1]
o.a.c.p.i.FileIdempotentRepository : Appending Data_File_12.csv-193255587 to idempotent filestore: target\file-dest\.file-key-repo\repo
o.a.c.c.file.remote.SftpConsumer : About to process file: RemoteFile[Data_File_15.csv] using exchange: Exchange[]
o.apache.camel.processor.SendProcessor : >>>> file://target/file-dest/MISA Exchange[ID-LON-2016-1516204084378-0-2]
o.a.camel.component.file.FileOperations : Using InputStream to write file: target\file-dest\MISA\Data_File_15.csv
o.a.c.c.file.GenericFileProducer : Wrote [target\file-dest\MISA\Data_File_15.csv] to [file://target/file-dest/MISA]
o.a.c.c.file.GenericFileOnCompletion : Done processing file: RemoteFile[Data_File_15.csv] using exchange: Exchange[ID-LON-2016-1516204084378-0-2]
o.a.c.p.i.FileIdempotentRepository : Appending Data_File_15.csv-191072548 to idempotent filestore: target\file-dest\.file-key-repo\repo
o.a.c.c.file.remote.SftpConsumer : About to process file: RemoteFile[Data_File_16.csv] using exchange: Exchange[]
o.apache.camel.processor.SendProcessor : >>>> file://target/file-dest/MISA Exchange[ID-LON-2016-1516204084378-0-3]
o.a.camel.component.file.FileOperations : Using InputStream to write file: target\file-dest\MISA\Data_File_16.csv
o.a.c.c.file.GenericFileProducer : Wrote [target\file-dest\MISA\Data_File_16.csv] to [file://target/file-dest/MISA]
o.a.c.c.file.GenericFileOnCompletion : Done processing file: RemoteFile[Data_File_16.csv] using exchange: Exchange[ID-LON-2016-1516204084378-0-3]
o.a.c.p.i.FileIdempotentRepository : Appending Data_File_16.csv-192867682 to idempotent filestore: target\file-dest\.file-key-repo\repo
Ah you log the message after you download it, and you use streamDownload=true.
See this FAQ-why-is-my-message-body-empty and how you need to use stream caching if doing so.
Because the message is streaming based, then either do NOT log the message body (you can log headers etc) and then route it to the file endpoint so its saved directly as a file.

Requested file does not exist Mule SFTP

I am polling from SFTP in mulesoft every second,fileAge is set to 0, connection pool size is 1 and autodelete is enabled. Then i save the file to the directory within a File connector which is polling ever 2 seconds and file age is 500(This is the outbound endpoint. Then the next flow starts with this same directory as File inbound endpoint and process the file. Here is polling set to every 3 seconds and autodelete is enabled.I get this error but file is processed..
java.io.IOException: The requested file does not exist (//file/7ggot1517.txt)
at org.mule.transport.sftp.SftpClient.getSize(SftpClient.java:499)
at org.mule.transport.sftp.SftpClient.retrieveFile(SftpClient.java:378)
...
Does anyone have some knowledge how to configure sftp and file connector to :
1.Read File From SFTP and delete it from SFTP
2.Process the File from local directory and delete it?
3.Get rid of that error
Thank you
Can you try the below configuration...I tried reading file from FTP to local directory ..
Replace FTP with SFTP
Use the little small groovy script provided in that.This should work .I just tested this and working as expected .Deleting can be done by autoDelete attribute or fileAge .Please let me know if this helps
<flow name="ftptestFlow">
<ftp:inbound-endpoint host="hostname" port="port" path="path/filename" user="userid" password="password" responseTimeout="10000" doc:name="FTP"/>
<set-variable variableName="fileName" value="fileName" doc:name="fileName"/>
<scripting:component doc:name="getFile">
<scripting:script engine="Groovy"><![CDATA[new File(flowVars.fileName).getText('UTF-8')]]></scripting:script>
</scripting:component>
<file:outbound-endpoint path="path" outputPattern="filename" responseTimeout="10000" doc:name="File"/>
</flow>
​
Your SFTP inbound endpoint probably tries to poll the file a first time, but a second poll is started before the first one had a chance to delete file. Something like this happens:
First poll - a file is found, let's read it => OK
First poll - read the file and process it => OK
Second poll - a file is found, let's read it => OK
First poll - processing finished, delete the file => OK
Second poll - read the file and process it => Error: file has been deleted
As you see a second poll detects the presence of the file before the first poll actually deletes it, but by the time it tries to read it the first poll already had the file deleted.
You can use the tempDir attribute on your SFTP inbound endpoint, it will move the file to a sub-directory of the folder where it is read before processing, ensuring subsequent polls are not triggered for the same file again. It then does something like:
First poll - a file is found, move it to tempDir and let's read it => OK
First poll - read the file and process it => OK
Second poll - No file found (it has been moved!) => OK
First poll - processing finished, delete the file => OK
Such as:
<sftp:inbound-endpoint connector-ref="SFTP"
tempDir="${ftp.path}/tmpPoll"
host="${ftp.host}"
port="${ftp.port}"
path="${ftp.path}"
user="${ftp.user}"
password="${ftp.password}" doc:name="SFTP" responseTimeout="10000"/>
You also need to make sure the SFTP user can read/write the sub-dir or create it if necessary. Everything is documented here.
EDIT: and to delete your file from the local machine you can simply use a Java or Groovy component once it has been properly processed
try {
Files.delete(filePath);
} catch (...) {
}

Mulesoft Dataweave, LDAP to SOAP large message truncating at certain size. Size limit?

(question tldr at end)
So my task for the Mule "Transform Message" component is to take a bunch of user info from LDAP Directory Service and provide it to an old database endpoint using SOAP. Fairly simple transform stuff.
The main ! about this operation is the size of the message that has to be provided to the endpoint. The entire payload has to be provided in a single message, otherwise the service will remove all entries that are not part of the payload (there is no explicit 'delete' service). This is an issue because the amount of users in the directory is roughly 20,000 causing every message to be 5MB or so in size.
My flow in Mule Studio currently works with a low amount of users being returned from the LDAP component. Successful return from the endpoint and I can see the data updated in the legacy environment. When applying this to a more 'production-realistic' load the Web Service Consumer (SOAP) craps out with an odd exception (unexpected EOF/character).
So I stuck a File component in the middle to dumpcheck the message that was being sent to the Consumer. The message is indeed getting cut before it can finish, which is where the EOF is coming from.
This is the transform script in Dataweave.
%output application/xml
%namespace ns0 test.namespace.com
---
{
ns0#updateContact: {
ns0#ContactType: "Primary",
ns0#ContactDetails: {
(payload map {
(ns0#ContactDetailElem: {
ns0#personID: $.personID,
ns0#contactDetail: $.desc
}) when $.personID != null
})
}
}
}
The expected output is below and successfully occurs with a lesser payload.
<?xml version='1.0' encoding='windows-1252'?>
<ns0:updateContact xmlns:ns0="test.namespace.com">
<ns0:ContactType>Primary</ns0:ContactType>
<ns0:ContactDetails>
<../>
<ns0:ContactDetailElem>
<ns0:personID>{Integer}</ns0:personID>
<ns0:contactDetail>{String.detail}</ns0:contactDetail>
</ns0:ContactDetailElem>
<../>
</ns0:ContactDetails>
</ns0:updateContact>
On the big payload the following happens at the end of the file
<?xml version='1.0' encoding='windows-1252'?>
<ns0:updateContact xmlns:ns0="test.namespace.com">
<ns0:ContactType>Primary</ns0:ContactType>
<ns0:ContactDetails>
<../>
<ns0:ContactDetailElem>
<ns0:personID>{Integer}</ns0:personID>
<ns0:contactDetail>{String.detail}</ns0:contactDetail>
</ns0:ContactDeta
Which looks like a typo but is what looks like the message being cut before it can finish. The file size is always stopped at 3,553,099 characters. Of course this makes the flow crap out as the xml is invalid.
The question then is there a limit on the message size that the Dataweave transformer can create? If not a legitimate bug but a configuration issue, where would I find this setting? I've had a look around but can't find anybody encounter this type of issue.
TL;DR: Do Dataweave transform messages have a size limit around 3.38MB?
Exception caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
PS: I've found the documentation on dataweave streaming after typing this up, will see if this can help my situation. Otherwise i'm considering implementing a workaround to construct the message outside dataweave and then passing it to the Consumer.
Are you using Mule version 3.8.3? Try 3.8.4, it fixed a bug in DataWeave which caused cutoff of Strings in some cases.
We have a similar problem, same as yours that is with the problem of size. We implemented streaming using stax.

EOFException in JMeter

If I'm using 2 users for a thread group, first 2 test data are captured through CSV Data Set Config in the 1st iteration, but the next test data are not captured by JMeter in the next consecutive iterations in the playback time. And EOFException is displayed in jmeter log. Can anyone provide me any solution for it ?
Jmeter log:
*2014/12/16 03:05:23 WARN - jmeter.threads.JMeterThread: The delay timer was interrupted - probably did not wait as long as intended.
2014/12/16 03:05:23 ERROR - jmeter.protocol.http.sampler.HTTPJavaImpl: readResponse: java.io.EOFException
2014/12/16 03:05:23 INFO - jmeter.protocol.http.sampler.HTTPJavaImpl: Error Response Code: 200, Server sent no Errorpage
2014/12/16 03:05:23 ERROR - jmeter.protocol.http.sampler.HTTPJavaImpl: readResponse: java.io.EOFException
2014/12/16 03:05:23 INFO - jmeter.protocol.http.sampler.HTTPJavaImpl: Error Response Code: 200, Server sent no Errorpage*
There are 2 possible reasons:
You're using wrong path to CSV file (the most frequent cause is using relative CSV file path without being sure in current JMeters working directory). Solution is: use full paths where possible.
You have "Recycle on EOF" set to "false" in your CSV Data Set Config
See Using CSV DATA SET CONFIG guide for more details on where to place and how to properly configure CSV Data Set Config test element.

Categories

Resources