I've been struggling to get to the bottom of why it is that some files are not correctly downloaded.
It seems like certain files just won't be downloaded fully, even when testing locally and restarting my application.
To make matters more difficult it is not always consistent.
Info:
Apache Camel version: 2.20.0
Integrated into Spring-Boot application using the camel-spring-boot-starter
Files are about 190M
Files download ok using standalone Jsch and Linux sftp client
Heap size set to 1G and memory usage doesn't even get close to the max
Camel doesn't detect anything wrong with the download, even if number of bytes written is tens of megabytes less than the length of the file according to camel headers (camel headers have correct file length)
I've observed the issue with org.apache.camel logging set to TRACE without seeing anything strange in the logs.
Idemoptent repo is updated as if the file was processed correctly
I see the same issue on Linux and Windows
Any advise on what the issue might be or suggestions for how to troubleshoot would be awesome!
Route config (a bit artificially created since values come from spring-boot config):
public class FileRouteBuilder extends RouteBuilder {
// Cut
#Override
public void configure() throws Exception {
errorHandler(deadLetterChannel("seda:"+ROUTE_ID_ERROR_EMAIL));
from("sftp://username#hostname/OUT?noop=true&streamDownload=true&password=password&include=Data_file.*csv&idempotentRepository=#keyRepo&greedy=true&delay=5m&maxMessagesPerPoll=10&readLock=changed")
.id(routeConfig.getRouteId())
.routeDescription(routeConfig.getRouteId())
.setHeader(HEADER_FILE_SOURCE, constant(routeConfig.getRouteId()))
.to("log:feeds." + routeConfig.getRouteId() + "?level=INFO&showAll=true")
// Exclude all files oder than the specified number of hours
.filter(new FileModifiedSincePredicate(24))
.to(file:rootDir/DATA)
.to("seda:" + ROUTE_ID_ACTIVITY_EMAIL_NOTIFICATION)
.end();
}
}
}
Update1
Observations after adding binary=true.
First two files are downloaded correctly but the 3rd and final file on the server is not.
193255587 Data_File_12.csv
191072548 Data_File_15.csv
139929360 Data_File_16.csv
The correct file size of teh Data_FIle_16.csv file is 192867682 bytes, which is captured correctly in the the CamelFileLength header.
Update 2
Removed all the log and seda email components above, and re-ran.
The third file still doesn't get completely written.
Adding the relevant DEBUG level log output in the hope that it sheds some light on what is going on or perhaps rules out certain things.
From what I can tell the log doesn't show anything suspicious and there is not hint that the _16 file is incompletely written.
Is there anything which could be happening on the SFTP server that anyone is aware of that it is worth checking with the provider?
o.a.c.c.file.remote.SftpConsumer : Took 0.194 seconds to poll: OUT
o.a.c.c.file.remote.SftpConsumer : Total 3 files to consume
o.a.c.c.file.remote.SftpConsumer : About to process file: RemoteFile[Data_File_12.csv] using exchange: Exchange[]
o.apache.camel.processor.SendProcessor : >>>> file://target/file-dest/MISA Exchange[ID-LON-2016-1516204084378-0-1]
o.a.camel.component.file.FileOperations : Using InputStream to write file: target\file-dest\MISA\Data_File_12.csv
o.a.camel.converter.jaxp.XmlConverter : Created TransformerFactory: com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl#d9dfe93
o.a.c.c.file.GenericFileProducer : Wrote [target\file-dest\MISA\Data_File_12.csv] to [file://target/file-dest/MISA]
o.a.c.c.file.GenericFileOnCompletion : Done processing file: RemoteFile[Data_File_12.csv] using exchange: Exchange[ID-LON-2016-1516204084378-0-1]
o.a.c.p.i.FileIdempotentRepository : Appending Data_File_12.csv-193255587 to idempotent filestore: target\file-dest\.file-key-repo\repo
o.a.c.c.file.remote.SftpConsumer : About to process file: RemoteFile[Data_File_15.csv] using exchange: Exchange[]
o.apache.camel.processor.SendProcessor : >>>> file://target/file-dest/MISA Exchange[ID-LON-2016-1516204084378-0-2]
o.a.camel.component.file.FileOperations : Using InputStream to write file: target\file-dest\MISA\Data_File_15.csv
o.a.c.c.file.GenericFileProducer : Wrote [target\file-dest\MISA\Data_File_15.csv] to [file://target/file-dest/MISA]
o.a.c.c.file.GenericFileOnCompletion : Done processing file: RemoteFile[Data_File_15.csv] using exchange: Exchange[ID-LON-2016-1516204084378-0-2]
o.a.c.p.i.FileIdempotentRepository : Appending Data_File_15.csv-191072548 to idempotent filestore: target\file-dest\.file-key-repo\repo
o.a.c.c.file.remote.SftpConsumer : About to process file: RemoteFile[Data_File_16.csv] using exchange: Exchange[]
o.apache.camel.processor.SendProcessor : >>>> file://target/file-dest/MISA Exchange[ID-LON-2016-1516204084378-0-3]
o.a.camel.component.file.FileOperations : Using InputStream to write file: target\file-dest\MISA\Data_File_16.csv
o.a.c.c.file.GenericFileProducer : Wrote [target\file-dest\MISA\Data_File_16.csv] to [file://target/file-dest/MISA]
o.a.c.c.file.GenericFileOnCompletion : Done processing file: RemoteFile[Data_File_16.csv] using exchange: Exchange[ID-LON-2016-1516204084378-0-3]
o.a.c.p.i.FileIdempotentRepository : Appending Data_File_16.csv-192867682 to idempotent filestore: target\file-dest\.file-key-repo\repo
Ah you log the message after you download it, and you use streamDownload=true.
See this FAQ-why-is-my-message-body-empty and how you need to use stream caching if doing so.
Because the message is streaming based, then either do NOT log the message body (you can log headers etc) and then route it to the file endpoint so its saved directly as a file.
Related
We are developing a spring batch application which is going to process "big" files in the future. To maintain a low memory signature we use spring batch on the smallest possible chunks of these files.
After processing, we want to write a result back to SFTP, which also happens per chunk of the input file.
The current approach is as follows:
StepExecutionListener.before(): we send a message to the SftpOutboundAdapter with FileExistsMode.REPLACE and empty payload to create an empty file (with .writing)
Reader: will read the input file
Processor: will enhance the input with the results and return a list of string
Writer: will send the list of strings to another SftpOutboundAdapter with FileExistsMode.APPEND
StepExecutionListener.after(): In case the execution as successful we will rename the file to remove the .writing suffix.
Now I saw that there are Streaming Inbound Adapters but I could not find Streaming Outbound Adapters.
Is this really the only/best way to solve it by append? Or is it possible to stream the file content?
I have a direct Camel route triggered by a restful. The restful passes the name of a file (which needs to be processed) inside the Exchange body.
The route is very simple:
from("direct:myRoute")
.log("Reading file with name ${in.body}")
.pollEnrich().simple(inboundUri).timeout(5000)
.choice()
.when(body().isNull())
.log("Cannot read file. Body is null")
.otherwise()
.log("Processing file: ${in.headers.CamelFileAbsolutePath}")
...
where inboundUri is:
smb://DOMAIN;username:password#myLocation/myFolder/?include=${in.body}.csv&delay=5000&noop=true&idempotent=false&readLock=none&recursive=false&sortBy=reverse:file:modified
The first time I trigger this route I always get "Cannot read file. Body is null".
But if I trigger it again, it then works fine and the file gets processed.
Any idea why?
P.S. I've tried to set CAMEL in DEBUG mode, but I struggle to understand what it does. The first time I run it I get things like:
DefaultCamelContext : Using ComponentResolver: org.apache.camel.impl.DefaultComponentResolver#1016b44e to resolve component with name: smb
ResolverHelper : Lookup Component with name smb in registry. Found: null
ResolverHelper : Lookup Component with name smb-component in registry. Found: null
DefaultComponentResolver : Found component: smb via type: org.apacheextras.camel.component.jcifs.SmbComponent via: META-INF/services/org/apache/camel/component/smb
DefaultManagementAgent : Registered MBean with ObjectName: org.apache.camel:context=camel-1,type=components,name="smb"
...
PollEnricher : Consumer received no exchange
FilterProcessor : Filter matches: true for exchange: Exchange[ID-server-43626-1517937470434-0-2]
The second time the output is much shorter and the main differences seems to be:
ServiceHelper : Resuming service Consumer[smb://DOMAIN;username:password#myLocation/myFolder/?include=${in.body}.csv&delay=5000&noop=true&idempotent=false&readLock=none&recursive=false&sortBy=reverse:file:modified]
PollEnricher : Consumer received: Exchange[]
FilterProcessor : Filter matches: false for exchange: Exchange[ID-server-43626-1517937470434-0-4]
Set the delay value in your inbound uri to a lower value as it has a delay of 5000 which is the same as your timeout, so that does not allow enough time for it to run. Set it to 1000 or 500 or something.
Scenario:
I am trying to stream and process some large xml files. These files are send from a producer asynchronously.
producerTemplate.sendBodyAndHeaders(endpointUri, inStream, ImmutableMap.of(JOBID_PROPERTY, importJob.getId()));
I need to batch all file input streams, identify the files by probing them with xpath and reorder them according to their content. I have the following route:
from("direct:route1")
.streamCaching()
.choice()
.when(xpath("//Tag1")) .setHeader("execOrder", constant(3)) .setHeader("xmlRoute", constant( "direct:some-route"))
.when(xpath("//Tag2")) .setHeader("execOrder", constant(1)) .setHeader("xmlRoute", constant( "direct:some-other-route"))
.when(xpath("//Tag3")) .setHeader("execOrder", constant(2)) .setHeader("xmlRoute", constant( "direct:yet-another-route"))
.otherwise()
.to("direct:somewhereelse")
.end()
.resequence(header("execOrder"))
.batch(new BatchResequencerConfig(300, 10000L))
.allowDuplicates()
.recipientList(header("xmlRoute"))
When running my code I get the following error:
2017-11-23 11:43:13.442 INFO 10267 --- [ - Batch Sender] c.w.n.s.m.DefaultImportJobService : Updating entity ImportJob with id 5a16a61803af33281b22c716
2017-11-23 11:43:13.451 WARN 10267 --- [ - Batch Sender] org.apache.camel.processor.Resequencer : Error processing aggregated exchange: Exchange[ID-int-0-142-bcd-wsint-pro-59594-1511433568520-0-20]. Caused by: [org.apache.camel.RuntimeCamelException - Cannot reset stream from file /var/folders/dc/fkrgdrnx6txbg7jfdjd_58mm0000gn/T/camel/camel-tmp-39abaae8-9bdd-435a-b63d-299ad8b06415/cos1499080503439465502.tmp]
org.apache.camel.RuntimeCamelException: Cannot reset stream from file /var/folders/dc/fkrgdrnx6txbg7jfdjd_58mm0000gn/T/camel/camel-tmp-39abaae8-9bdd-435a-b63d-299ad8b06415/cos1499080503439465502.tmp
at org.apache.camel.converter.stream.FileInputStreamCache.reset(FileInputStreamCache.java:91)
I've read here that the FileInputStreamCache is closed when the XPathBuilder.getDocument() is called, and the temp file is deleted, so you get the FileNotFoundException when the XPathBuilder wants to reset the InputStream
The solution seems to be to disable the spooling to disk like that:
camelContext.getStreamCachingStrategy().setSpoolThreshold(-1);
However, I don't want to do that because of RAM restrictions, i.e. files can get up to 600MB and I don't want to keep them in memory. Any ideas how to solve the problem?
The resequencer is a two-leg pattern (stateful) and will cause the original exchange to be done beforehand, as its keeping a copy in memory while re-sequencing until the gap is fulfilled and sending the messages out in the new order.
Since your input stream comes from some HTTP service then that would be closed beforehand the resequencer may output the exchange.
Either do as suggested to store to local disk first, and then let the resequencer work on that, or find a way not to use the resequencer.
I ended up doing what Claus and Ricardo suggested. I made a separate route which saves the files to disk. Then another one which probes the files and resequences the exchanges according to a fixed order.
String xmlUploadDirectory = "file://" + Files.createTempDir().path + "/xmls?noop=true"
from("direct:route1")
.to(xmlUploadDirectory)
from(xmlUploadDirectory)
.choice()
.when(xpath("//Tag1")).setHeader("execOrder", constant(3)).setHeader("xmlRoute", constant( "direct:some-route"))
.when(xpath("//Tag2")).setHeader("execOrder", constant(1)).setHeader("xmlRoute", constant( "direct:some-other-route"))
.when(xpath("//Tag3")).setHeader("execOrder", constant(2)).setHeader("xmlRoute", constant( "direct:yet-another-route"))
.otherwise()
.to("direct:somewhereelse")
.end()
.to("direct:resequencing")
from("direct:resequencing")
.resequence(header("execOrder"))
.batch(new BatchResequencerConfig(300, 10000L))
.allowDuplicates()
.recipientList(header("xmlRoute"))
(question tldr at end)
So my task for the Mule "Transform Message" component is to take a bunch of user info from LDAP Directory Service and provide it to an old database endpoint using SOAP. Fairly simple transform stuff.
The main ! about this operation is the size of the message that has to be provided to the endpoint. The entire payload has to be provided in a single message, otherwise the service will remove all entries that are not part of the payload (there is no explicit 'delete' service). This is an issue because the amount of users in the directory is roughly 20,000 causing every message to be 5MB or so in size.
My flow in Mule Studio currently works with a low amount of users being returned from the LDAP component. Successful return from the endpoint and I can see the data updated in the legacy environment. When applying this to a more 'production-realistic' load the Web Service Consumer (SOAP) craps out with an odd exception (unexpected EOF/character).
So I stuck a File component in the middle to dumpcheck the message that was being sent to the Consumer. The message is indeed getting cut before it can finish, which is where the EOF is coming from.
This is the transform script in Dataweave.
%output application/xml
%namespace ns0 test.namespace.com
---
{
ns0#updateContact: {
ns0#ContactType: "Primary",
ns0#ContactDetails: {
(payload map {
(ns0#ContactDetailElem: {
ns0#personID: $.personID,
ns0#contactDetail: $.desc
}) when $.personID != null
})
}
}
}
The expected output is below and successfully occurs with a lesser payload.
<?xml version='1.0' encoding='windows-1252'?>
<ns0:updateContact xmlns:ns0="test.namespace.com">
<ns0:ContactType>Primary</ns0:ContactType>
<ns0:ContactDetails>
<../>
<ns0:ContactDetailElem>
<ns0:personID>{Integer}</ns0:personID>
<ns0:contactDetail>{String.detail}</ns0:contactDetail>
</ns0:ContactDetailElem>
<../>
</ns0:ContactDetails>
</ns0:updateContact>
On the big payload the following happens at the end of the file
<?xml version='1.0' encoding='windows-1252'?>
<ns0:updateContact xmlns:ns0="test.namespace.com">
<ns0:ContactType>Primary</ns0:ContactType>
<ns0:ContactDetails>
<../>
<ns0:ContactDetailElem>
<ns0:personID>{Integer}</ns0:personID>
<ns0:contactDetail>{String.detail}</ns0:contactDetail>
</ns0:ContactDeta
Which looks like a typo but is what looks like the message being cut before it can finish. The file size is always stopped at 3,553,099 characters. Of course this makes the flow crap out as the xml is invalid.
The question then is there a limit on the message size that the Dataweave transformer can create? If not a legitimate bug but a configuration issue, where would I find this setting? I've had a look around but can't find anybody encounter this type of issue.
TL;DR: Do Dataweave transform messages have a size limit around 3.38MB?
Exception caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
PS: I've found the documentation on dataweave streaming after typing this up, will see if this can help my situation. Otherwise i'm considering implementing a workaround to construct the message outside dataweave and then passing it to the Consumer.
Are you using Mule version 3.8.3? Try 3.8.4, it fixed a bug in DataWeave which caused cutoff of Strings in some cases.
We have a similar problem, same as yours that is with the problem of size. We implemented streaming using stax.
If I'm using 2 users for a thread group, first 2 test data are captured through CSV Data Set Config in the 1st iteration, but the next test data are not captured by JMeter in the next consecutive iterations in the playback time. And EOFException is displayed in jmeter log. Can anyone provide me any solution for it ?
Jmeter log:
*2014/12/16 03:05:23 WARN - jmeter.threads.JMeterThread: The delay timer was interrupted - probably did not wait as long as intended.
2014/12/16 03:05:23 ERROR - jmeter.protocol.http.sampler.HTTPJavaImpl: readResponse: java.io.EOFException
2014/12/16 03:05:23 INFO - jmeter.protocol.http.sampler.HTTPJavaImpl: Error Response Code: 200, Server sent no Errorpage
2014/12/16 03:05:23 ERROR - jmeter.protocol.http.sampler.HTTPJavaImpl: readResponse: java.io.EOFException
2014/12/16 03:05:23 INFO - jmeter.protocol.http.sampler.HTTPJavaImpl: Error Response Code: 200, Server sent no Errorpage*
There are 2 possible reasons:
You're using wrong path to CSV file (the most frequent cause is using relative CSV file path without being sure in current JMeters working directory). Solution is: use full paths where possible.
You have "Recycle on EOF" set to "false" in your CSV Data Set Config
See Using CSV DATA SET CONFIG guide for more details on where to place and how to properly configure CSV Data Set Config test element.