Spring Outbound SFTP integration streaming - java

We are developing a spring batch application which is going to process "big" files in the future. To maintain a low memory signature we use spring batch on the smallest possible chunks of these files.
After processing, we want to write a result back to SFTP, which also happens per chunk of the input file.
The current approach is as follows:
StepExecutionListener.before(): we send a message to the SftpOutboundAdapter with FileExistsMode.REPLACE and empty payload to create an empty file (with .writing)
Reader: will read the input file
Processor: will enhance the input with the results and return a list of string
Writer: will send the list of strings to another SftpOutboundAdapter with FileExistsMode.APPEND
StepExecutionListener.after(): In case the execution as successful we will rename the file to remove the .writing suffix.
Now I saw that there are Streaming Inbound Adapters but I could not find Streaming Outbound Adapters.
Is this really the only/best way to solve it by append? Or is it possible to stream the file content?

Related

Project Reactor Kafka: Perform action at the end of Flux without blocking

I am working on an application that is using project-reactor Kafka APIs to connect reactively to Kafka-brokers. The use-case is that there is an input topic which contains file-paths to the files for processing. The application reads each file, processes it, creates a flux of the processed messages and pushes it to the output topic. The requirement is that the file must be deleted once it has been processed and processed messages should be pushed to the output topic. So, the delete action must be performed after each file has been processed and the flux of the message pushed to the output topic.
public Flux<?> flux() {
return KafkaReceiver
.create(receiverOptions(Collections.singleton(sourceTopic)))
.receive()
.flatMap(m -> transform(m.value()).map(x -> SenderRecord.create(x,
m.receiverOffset())))
.as(sender::send)
.doOnNext(m -> {
m.correlationMetadata().acknowledge();
deleteFile(path);
}).doOnCancel(() -> close());
}
*The transform() method initiates the file processing in the file path(m.value()) and returns a flux of messages.
The problem is that the file is deleted even before all the messages is pushed to output topic. Therefore, in case of a failure, on re-try the original file is not available.
Since it seems the path variable is accessible in the whole pipeline (method input parameter?), you could delete the file within a separate doFinally. You would need to filter for onComplete or cancel SignalType, because you don't want to delete the file in case of a failure.
Another option would be doOnComplete if you're not interested in deleting the file upon cancellation.

Apache Camel: Cached stream file deletion causing file not found errors

Scenario:
I am trying to stream and process some large xml files. These files are send from a producer asynchronously.
producerTemplate.sendBodyAndHeaders(endpointUri, inStream, ImmutableMap.of(JOBID_PROPERTY, importJob.getId()));
I need to batch all file input streams, identify the files by probing them with xpath and reorder them according to their content. I have the following route:
from("direct:route1")
.streamCaching()
.choice()
.when(xpath("//Tag1")) .setHeader("execOrder", constant(3)) .setHeader("xmlRoute", constant( "direct:some-route"))
.when(xpath("//Tag2")) .setHeader("execOrder", constant(1)) .setHeader("xmlRoute", constant( "direct:some-other-route"))
.when(xpath("//Tag3")) .setHeader("execOrder", constant(2)) .setHeader("xmlRoute", constant( "direct:yet-another-route"))
.otherwise()
.to("direct:somewhereelse")
.end()
.resequence(header("execOrder"))
.batch(new BatchResequencerConfig(300, 10000L))
.allowDuplicates()
.recipientList(header("xmlRoute"))
When running my code I get the following error:
2017-11-23 11:43:13.442 INFO 10267 --- [ - Batch Sender] c.w.n.s.m.DefaultImportJobService : Updating entity ImportJob with id 5a16a61803af33281b22c716
2017-11-23 11:43:13.451 WARN 10267 --- [ - Batch Sender] org.apache.camel.processor.Resequencer : Error processing aggregated exchange: Exchange[ID-int-0-142-bcd-wsint-pro-59594-1511433568520-0-20]. Caused by: [org.apache.camel.RuntimeCamelException - Cannot reset stream from file /var/folders/dc/fkrgdrnx6txbg7jfdjd_58mm0000gn/T/camel/camel-tmp-39abaae8-9bdd-435a-b63d-299ad8b06415/cos1499080503439465502.tmp]
org.apache.camel.RuntimeCamelException: Cannot reset stream from file /var/folders/dc/fkrgdrnx6txbg7jfdjd_58mm0000gn/T/camel/camel-tmp-39abaae8-9bdd-435a-b63d-299ad8b06415/cos1499080503439465502.tmp
at org.apache.camel.converter.stream.FileInputStreamCache.reset(FileInputStreamCache.java:91)
I've read here that the FileInputStreamCache is closed when the XPathBuilder.getDocument() is called, and the temp file is deleted, so you get the FileNotFoundException when the XPathBuilder wants to reset the InputStream
The solution seems to be to disable the spooling to disk like that:
camelContext.getStreamCachingStrategy().setSpoolThreshold(-1);
However, I don't want to do that because of RAM restrictions, i.e. files can get up to 600MB and I don't want to keep them in memory. Any ideas how to solve the problem?
The resequencer is a two-leg pattern (stateful) and will cause the original exchange to be done beforehand, as its keeping a copy in memory while re-sequencing until the gap is fulfilled and sending the messages out in the new order.
Since your input stream comes from some HTTP service then that would be closed beforehand the resequencer may output the exchange.
Either do as suggested to store to local disk first, and then let the resequencer work on that, or find a way not to use the resequencer.
I ended up doing what Claus and Ricardo suggested. I made a separate route which saves the files to disk. Then another one which probes the files and resequences the exchanges according to a fixed order.
String xmlUploadDirectory = "file://" + Files.createTempDir().path + "/xmls?noop=true"
from("direct:route1")
.to(xmlUploadDirectory)
from(xmlUploadDirectory)
.choice()
.when(xpath("//Tag1")).setHeader("execOrder", constant(3)).setHeader("xmlRoute", constant( "direct:some-route"))
.when(xpath("//Tag2")).setHeader("execOrder", constant(1)).setHeader("xmlRoute", constant( "direct:some-other-route"))
.when(xpath("//Tag3")).setHeader("execOrder", constant(2)).setHeader("xmlRoute", constant( "direct:yet-another-route"))
.otherwise()
.to("direct:somewhereelse")
.end()
.to("direct:resequencing")
from("direct:resequencing")
.resequence(header("execOrder"))
.batch(new BatchResequencerConfig(300, 10000L))
.allowDuplicates()
.recipientList(header("xmlRoute"))

Launch a batch job based on Json Parameter in Spring-XD

I assume that this is not possible as it is not listed in the XD documentation.
What I am looking for is a way to launch a job dynamically from a RabbitMQ message that contains the jobName in the payload. This would allow me to have a single job queue where all my jobs are sent rather than having to have a separate queue of each job.
{
jobName:"myJob",
jobParm1: "parm1",
jobParm2: "parm2"
}
This would allow me to have a single job queue where all my jobs are sent rather than having to have a separate queue of each job.
This post shows that is possible using http
You could construct an XD stream the reads from rabbit, transforms the payload and invokes the http-client processor (dumping the output the null or log sink).

How can I switch between 2 groups of handlers in netty

I am trying to write a client server application that communicate using Message Objects(Message Class is defined in my application). there is a scenario in which i want to transfer file between them. First I must send a message to client to notify it about specific file information and after that the file itself is going to be written to channel.
The problem is how can I handle this scenario in client?
Is it a good solution to remove Message handler after receiving message and replace it with a byte array handler?
what are the alternatives?
Sure you can just modify the pipeline on the fly. We do something similar in our portunification example[1].
[1] https://github.com/netty/netty/blob/4.0/example/src/main/java/io/netty/example/portunification/PortUnificationServerHandler.java

Can Java Play framework store more than 100KB of a request in memory without writing to disk

I'm working on a throughput-intensive application, and in trying to identify the bottlenecks, I found that writing the request body to disk and then reading it back in when the entire request is received is a pretty slow operation.
My sending side will send me data up to 512KB in one HTTP POST and that can't be changed, so I'm looking for ways of handling it better on the server. In debugger I see, that Play uses RawBuffer class to manage incoming data and that class has memoryThreshold field, which is currently set for 100KB. Does anyone know of a way to programmatically or via a configuration file to change this default to be 512KB?
Update:
Things I've tried with no success:
Entering "parsers.text.maxLength=512K" in the application.conf file.
Just for kicks and giggles "parsers.raw.maxLength=512K" and "parsers.raw.memoryThreshold=512K" in application.conf
Adding "#BodyParser.Of( value = BodyParser.Raw.class, maxLength = 512 * 1024 )" annotation to my action method.
All three application.conf property names above with "512288" instead of "512K"
If I look at the code I read
lazy val DEFAULT_MAX_TEXT_LENGTH: Int = Play.maybeApplication.flatMap { app =>
app.configuration.getBytes("parsers.text.maxLength").map(_.toInt)
}.getOrElse(1024 * 100)
which makes me think that the parsers.text.maxLength property is the one you look for.

Categories

Resources