This question already has answers here:
How can we share data between the different steps of a Job in Spring Batch?
(12 answers)
Closed 3 years ago.
I have two business logic steps:
download xml from external resource parse and transform it into objects
dispatch the output(object list) to external queue
#Bean
public Job job() throws Exception {
return this.jobs.get("job").start(getXmlViaHttpStep()).next(pushMessageToQueue()).build();
}
So my first Step is Tasklet which downloads (via http) the file and converts it into Objects.
My second task is another Tasklet that suppose to dispatch the output from the previous step.
Now how do I pass the output list from step1 into step2 (as its input)?
I could save that on temp file, but isn't there another best practice scenario for this?
I can see at least two options that are both viable.
Option 1: setup the job as one step
You can setup your job to contain one step where the reader simply reads the input from your URL and the writer posts to your queue.
Option 2: setup the job as two steps with intermediate storage
However, you may want to divide the job in two steps to be able to re-run a step if it fails and simplify debugging etc. In that cas, the following approach may work out for you:
Step 1: Create a step with a FlatFileItemReader or similar is used to download the file. The step can then configure a FlatFileItemWriter to move the contents to disk.
Step 2: Open the file produced by the ItemWriter from the previous step. One alternative is to use the org.springframework.batch.item.xml.StaxEventItemReader together with a Jaxb2Marshaller to handle the processing (as described in this blog). Configure the output step to post messages to a queue by using e.g. org.springframework.batch.item.jms.JmsItemWriter. The writer is (as always) chunked so multiple messages can be posted at for each write.
Personally, I would probably setup the whole thing as Option 2. I find simple steps without too much transformations are easier to follow and also easier to test but that is just a matter of taste.
Related
I am new to Spring batch and I have a peculiar problem. I want to get results from a 3 different jpa queries with JpaPagingItemReader and process them individually and write them into one consolidated XML file using StaxEventItemWriter.
For eg the resultant XML would look like,
<root>
<query1>
...
</query1>
<query2>
...
</query2>
<query3>
...
</query3>
</root>
Please let me know how to achieve this?
Also, I currently implemented my configurer with one query but the reader/writer is also quite slow. It took around 59 minutes to generate file of 20MB as I am running it in single threaded environment as of now as opposed to multithreaded env. If there are any other suggestions around it, please do let me know. Thanks.
EDIT:
I tried following this approach:
Created 3 different steps and added 1 reader, processor, writer in each of them but the problem I am facing now is writer is not able to write in the same file or append to it.
This is written in StaxEventItemWriter class:
FileUtils.setUpOutputFile(file, restarted, false, overwriteOutput);
Here 3rd argument append is false by default.
The second approach to your question seems the right direction, you could create 3 different readers/processors/writers and create your custom writer which should extend AbstractFileItemWriter in which setAppend is allowed. Also, I have seen that xmlWriter writes faster xmls than StaxEventItemWriter but there is some trade off in writing boiler plate code.
One option off the top of my head is to
create a StaxEventItemWriter
create 3 instances of a step that has a JpaPagingItemReader and writes the corresponding <queryX>...</queryX> section to the shared writer
write the <root> and </root> tags in a JobExecutionListener, so the steps don't care about the envelope
There are other considerations here, like whether it's always 3 files, etc. but the general idea is to separate concerns between processors, step, job, tasks, and listeners to make each perform a clear piece of work.
use JVisualVm to monitor the bottlenecks inside your application.
Since you said it is taking 59 minutes to create file of 20MB, you will get better insights of where you are getting performance hits.
VisualVm tutorial
Open visualvm connect your application => sampler => cpu => CPU Samples.
Take snapshot at various times and analyse where is it taking much time. By checking this only you will get enough data for optimisation.
Note: JvisualVm comes under oracle jdk 8 distribution. you can simply type jvisualvm on command prompt/terminal. if not download from here
I have written a spring batch program to read/process/write data into a single file. I have a new business requirement wherein from the same data what I am reading, I have to build another list with different data and process/format the data and write it onto a separate file.
I have looked into MultiFormatItemWriter in which I can define separate FlatFileItemWriters & CompositeItemWriter as well, but i am unable to understand how to send different lists to these different filewriters.
Please do suggest some options with sample code if possible.
A combination of ClassifierCompositeItemProcessor and ClassifierCompositeItemWriter is what you are looking for. The classifier allows you to route items to the right processor/writer based on their class.
You can find an example here.
I have written a Spring Boot micro service using RxJava (aggregated service) to implement the following simplified usecase. The big picture is when an instructor uploads a course content document, set of questions should be generated and saved.
User uploads a document to the system.
The system calls a Document Service to convert the document into a text.
Then it calls another question generating service to generate set of questions given the above text content.
Finally these questions are posted into a basic CRUD micro service to save.
When a user uploads a document, lots of questions are created from it (may be hundreds or so). The problem here is I am posting questions one at a time sequentially for the CRUD service to save them. This slows down the operation drastically due to IO intensive network calls hence it takes around 20 seconds to complete the entire process. Here is the current code assuming all the questions are formulated.
questions.flatMapIterable(list -> list).flatMap(q -> createQuestion(q)).toList();
private Observable<QuestionDTO> createQuestion(QuestionDTO question) {
return Observable.<QuestionDTO> create(sub -> {
QuestionDTO questionCreated = restTemplate.postForEntity(QUESTIONSERVICE_API,
new org.springframework.http.HttpEntity<QuestionDTO>(question), QuestionDTO.class).getBody();
sub.onNext(questionCreated);
sub.onCompleted();
}).doOnNext(s -> log.debug("Question was created successfully."))
.doOnError(e -> log.error("An ERROR occurred while creating a question: " + e.getMessage()));
}
Now my requirement is to post all the questions in parallel to the CRUD service and merge the results on completion. Also note that the CRUD service will accept only one question object at a time and that can not be changed. I know that I can use Observable.zip operator for this purpose, but I have no idea on how to apply it in this context since the actual number of questions is not predetermined. How can I change the code in line 1 so that I can improve the performance of the application. Any help is appreciated.
By default the observalbes in flatMap operate on the same scheduler as you subscribed it on. In order to run your createQuestion observables in parallel, you have to subscribe them on a computation scheduler.
questions.flatMapIterable(list -> list)
.flatMap(q -> createQuestion(q).subscribeOn(Schedulers.computation()))
.toList();
Check this article for a full explanation.
I was reading the spring documentation for spring batch project, I want to know if there is an out of the box configuration to chain steps, it means the output of the first step be the input for the second one and so on.
I'm not asking about step flows which one execute after other, is more about use the exit of the item processor of a step to be the input of the next one.
What I have in mind is use a normal step with reader, processor and in the writer create a flat file that could be reader by the second reader in the next step but this seems to be inefficiently as need to write objects that are in the jvm and restore them with the second reader.
If not sure if this is possible with spring normal config, or jsr does not work exactly as I want
Instead of multiple steps use multiple ItemProcessors in a chain. You can chain them using a CompositeItemProcessor.
EDIT :
I was reading about the spring batch strategies and I dont found any out of the box configuration in xml to chain the steps in a kind of pipeline, the best option that fit my needs is use ItemProcessorAdapter to run the different logic that I need in the steps and use CompositeItemProcessor (6.21) to make a chain of them.
I am using MultiResourceItemReader class of Spring Batch. Which uses FlatFileReader bean as delegate.My files contains XML requests, my batch reading requestes from files hit its on to URL and writing response to corresponding output files. I want to define one thread for each file processing to decrease execution time. In my current requirement I have four input files , I want to define four thread to read ,process and write files. I tried with simpleTaskExecuter with
task-executor="simpleTaskExecutor" throttle-limit="20"
But after using this flatfileReader is throwing Exception.
I am beginner, please suggest me how to implement this. Thanks in advance.
There are a couple ways to go here. However, the easiest way would be to partition by file using the MultiResourcePartitioner. That in combination with the TaskExecutorPartitionHandler will give you reliable parallel processing of your input files. You can read more about partitioning in section 7.4 of our documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/scalability.html