I am new to spring batch development. I have the following requirement.
There will be a s3 source with zip files and each of the zipfiles will contain multiple pdf files and xml files.[Eg:100 pdfs and 100 xml files] (xml files will contain data about the pdf)
Batch needs to read the pdf files and its associated xml file and push these to rest service/db.
When I looked at examples, most of it all covered how to read a line from the file and process it. here I have the items itself as file. I want to read one pdf file(as bytes) + xml file(converted into pojo) as set and push this to rest service one by one.
Right now, I am doing all the reading and processing inside a single tasklet. but I am sure there will be better solution to implement it. Please suggest and thank you.
The chunk-oriented processing model requires you to first define what an item is. In your case, one option is to consider an item as the PDF file (data) with its associated XML file (metadata). You can create a class that represents such an item and a custom item reader for it. Once that in place, you can use the reader in a chunk oriented step and a processor or writer that sends data to your REST endpoint.
Related
I have written a spring batch program to read/process/write data into a single file. I have a new business requirement wherein from the same data what I am reading, I have to build another list with different data and process/format the data and write it onto a separate file.
I have looked into MultiFormatItemWriter in which I can define separate FlatFileItemWriters & CompositeItemWriter as well, but i am unable to understand how to send different lists to these different filewriters.
Please do suggest some options with sample code if possible.
A combination of ClassifierCompositeItemProcessor and ClassifierCompositeItemWriter is what you are looking for. The classifier allows you to route items to the right processor/writer based on their class.
You can find an example here.
I currently have a Spring Batch job that does the following:
Reads a list of csv files using a MultiResourceItemReader which delegates to a FlatFileItemReader.
Splits each file into chunks and writes each chunk as a JMS message, with each message containing the list of lines in the chunk and the filename of the underlying resource in JSON format.
What I want is for each chunk to only contain lines from a single file resource so that the filename on the JMS message will link up to the corresponding file.
The problem is that when processing of one file resource is complete, the reader will just continue and process the next resource meaning that lines from multiple resource files are being inserted into the same chunk and the filename property will not necessarily match the underlying data in the chunk.
Is there any clean way to prevent the reader from including lines from separate file resources in the same chunk?
EDIT: I believe the solution will require using a custom chunk completion policy to somehow determine if the current item being read is from the same resource as the previous line, not sure how feasible this is though. Any thoughts?.
I changed my implementation to use MultiResourcePartitioner to create a partitioned step per file, everything working now.
I have some logic already implemented that allows me to query for a set of records which contain an XML column. For each record I am to save the XML column data as a separate XML file as well as produce a CSV file of the same data. The problem is I can only do this for a limit number of records as I start to run out of memeory if too many records are being processed.
Currently the approach is:
To write to CSV, I am taking the InputStream for each XML column returned (using SQLXML objects for this) from query and parsing the inputstream using a DOM Parser along with some XPath in order to identify just the elements which contain any sort of text (child nodes, I don't really care about parents). Then using the element name as the header and text as the value in the csv file. The data is then being written to a file using a BufferedWriter along with StringBuilders to hold text (one for header, one for values).
Saving the data to XML file is being accomplished by taking the same InputStream mentioned above and converting it to a String and then finally writing it out to file using BufferedWriter. This really isn't the nicest looking at the end as the entire XML data is placed on a single line when it's written to the file, but it works for now.
I am quite new when it comes to working with XML in Java, so just wanting to get any advice or input on possible alternatives or more effecient practices to do this same process. Would like to be able to process at least 70 records at once.
Thanks in advance.
Hi I am doing POC/base for design on reading database and writing into flat files. I am struggling on couple of issues here but first I will tell you the output format of flat file
Please let me know how do design the input writer where I need to read the transactions from different tables, process records , figure out the summary fields and then how should I design the Item Writer which has such a complex design. Please advice. I am successfully able to read from single table and write to file but the above task looks complex.
Extend the FlatFileItemWriter to only open a file once and append to it instead of overwriting it. Then pass that same filewriter into multiple readers in the order you would like them to appear. (Make sure that each object read by the readers are extensible by something that the writer understands! Maybe interface BatchWriteable would be a good name.)
Some back-of-the-envelope pseudocode:
Before everything starts:
Open file.
Write file headers.
Start Batch step
implement as many times as necessary
Read Batch section
Process Batch section
A Write Batch section
when done:
Write file footer
Close file
I have created j2me application for read write of text file
now at time of reading I read one line and send it to server. after that I want to remove that line from text file.
I am not getting how to do it. in some example I found solution as copy original file content in one object then remove that string from object and then delete original file and create new with that new object.
I don't think it as good approach. is there any other way to do so???
Edit:
actually problem is like one application is writing some data in text file and my another application read one line send to server and remove that line.
Now if I go with the approach like copy new object and delete file and write new file with new object then I will found one problem
if file is deleted then first application can't found that file so it may create new file
with only one data and second application will create new file based on new object
so my data will be lost
Edit:
Even I tried to do same thing with RMS but when both application is accessing same RMS at that time all data in RMS file are clear. First application open RMS for writing and second Open for sync and delete. but at time when both are opening RMS all data clear.
Is it possible to set lock on RMS file from one application??
No, that's how you do it.
You can't delete a line from the beginning of a file. You would need to re-write the file without that line.
(Note that this is not specific to java)
As records are inserted i was creating single file for single record in one specific folder
now as that file is read by background application and send to server that will be deleted by application.
so it solve concurrency problem in file read write.
i know it is not good approach but i didn't find any other good approach.
Most file systems don't have a mechanism for deleting stuff in the middle. (pretty sure that's the case in j2me). So a standard practice is open a new file; copy the old file up to the point where the unwanted line goes, skip it, then copy the rest of the file. I know it sounds inelegant but that's just how it is :)