I have a requirement where I need to read 4 different csv files. These have to be read line by line. All files have different number of columns and values.
After processing, I have to generate the output in XML.
If someone can please throw some light, that how to achieve this?
Thanks
Spring Batch has a reader interface. You can write your own reader class which can have 4 individual FlatFileItemReaders and read them until all are done.
The writer is also an interface that you can implement yourself. You'd override the write method and do whatever you needed to do.
http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/ItemReader.html
http://docs.spring.io/spring-batch/trunk/apidocs/org/springframework/batch/item/ItemWriter.html
http://docs.spring.io/spring-batch/reference/html/readersAndWriters.html
Related
I am very new to Java and have been tasked to use spring batch to read in some text files. So far Spring batch resources online have helped me to get to a point where I am reading, processing and writing some simple test .csv files into Mongo.
The problem I have now is that the actual file I would like to read from has over 600 columns. Meaning that with the current way I am reading in my file to Java, I would need 600+ fields in my #Document mongo model.
I have been thinking of a couple of ways to get around this,
first I was thinking maybe I could read in each line as a string and then in my processor deal with splitting everything up and formatting the data to then return a list of my MongoTemplate but returning a List is not viable from the overridden process method.
So my question to you guys is,
What is the best way to handle reading in files with hundreds of
columns in spring batch? Or what would be the best resource to start
reading to help point me in the right direction.
Thanks!
I had a same problem I used
http://opencsv.sourceforge.net/apidocs/com/opencsv/CSVReader.html
for reading csvs.
I suggest you use Map instead of 600 java fields.
Besides, 600X600 java strings is not a big deal for java and neither for mongo.
To manipulate with mongo use http://jongo.org/
If you really need batch processing of data.
Your flow for batches should be,
Loop here : divide in batches(say 300 per loop)
Read 300X300 java objects(or in a Map) from file in memory.
Sanitize or Process them if needed.
Store in mongoDB.
return if EOF.
I ended up just reading in each line as a String object. Then in the processor looping over the String object with a delimiter creating my Mongo repository objects and storing them. So I am basically doing all of the writing inside the processor method which I would say is definitely not best practice but gives me the desired end result.
I am using MultiResourceItemReader class of Spring Batch. Which uses FlatFileReader bean as delegate.My files contains XML requests, my batch reading requestes from files hit its on to URL and writing response to corresponding output files. I want to define one thread for each file processing to decrease execution time. In my current requirement I have four input files , I want to define four thread to read ,process and write files. I tried with simpleTaskExecuter with
task-executor="simpleTaskExecutor" throttle-limit="20"
But after using this flatfileReader is throwing Exception.
I am beginner, please suggest me how to implement this. Thanks in advance.
There are a couple ways to go here. However, the easiest way would be to partition by file using the MultiResourcePartitioner. That in combination with the TaskExecutorPartitionHandler will give you reliable parallel processing of your input files. You can read more about partitioning in section 7.4 of our documentation here: http://docs.spring.io/spring-batch/trunk/reference/html/scalability.html
I need to read a excel(xls) file stored on Hadoop cluster. Now I did some research and found out that I need to create a custom InputFormat for that. I read many articles but none of them is helpful from programming point of view. If someone can help me with sample code for writing custom inputformat so that I can understand the basics of "Programming InputFormat" and can use Apache POI library to read the excel file.
I had made a mapreduce program for reading text file. Now I need help regarding the fact that even if I some how manage to code my own custom InputFormat where would I write the code in respect to the mapreduce program I have already written.
PS:- converting the .xls file into .csv file is not an option.
Yes, you should create RecordReader to read each record from your excel document. Inside that record reader you should use POI like api to read from excel docs. More precisely please do the following steps:
Extend FileInputFromat and create your own CustomInputFrmat and overrride getRecordReader .
Create a CustomRecordReader by extending RecordReader ,here you have to write how to generate a key value pair from a given filesplit.
So first read bytes from filesplit and from that bufferedbytes read out desired key and value using POI.
You can check myown CustomInputFormat and RecordReader to deal with custom data objects here
myCustomInputFormat
Your research is correct. You need a custom InputFormat for Hadoop. If you are lucky, somebody already created one for your use case.
If not, I would suggest to look for a Java library that is able to read excel files.
Since Excel is a proprietary file format, it is unlikely that you will find an implementation that works perfectly.
Once you found a library that is able to read Excel files, integrate it with the InputFormat.
Therefore, You have to extend the FileInputFormat of Hadoop. The getRecordReader that is being returned by your ExcelInputFormat must return the rows from your excel file. You probably also have to overwrite the getSplits() method to tell the framework not to split the file at all.
Hi I am doing POC/base for design on reading database and writing into flat files. I am struggling on couple of issues here but first I will tell you the output format of flat file
Please let me know how do design the input writer where I need to read the transactions from different tables, process records , figure out the summary fields and then how should I design the Item Writer which has such a complex design. Please advice. I am successfully able to read from single table and write to file but the above task looks complex.
Extend the FlatFileItemWriter to only open a file once and append to it instead of overwriting it. Then pass that same filewriter into multiple readers in the order you would like them to appear. (Make sure that each object read by the readers are extensible by something that the writer understands! Maybe interface BatchWriteable would be a good name.)
Some back-of-the-envelope pseudocode:
Before everything starts:
Open file.
Write file headers.
Start Batch step
implement as many times as necessary
Read Batch section
Process Batch section
A Write Batch section
when done:
Write file footer
Close file
I have about 100 different text files in the same format. Each file has observations about certain events at certain time periods for a particular person. I am trying to parse out the relevant information for a few individuals for each of the different text files. Ideally, I want to get parse through all this information and create a CSV file (eventually to be imported into Excel) with all the information I need from ALL the text files. Any help/ideas would be greatly appreciated. I would prefer to use java...but any simpler methods are appreciated.
The log files are structured as below: changed data to preserve private information
|||ID||NAME||TIME||FIRSTMEASURE^DATA||SECONDMEASURE^DATA|| etc...
TIME appears like 20110825111501 for 2011, 08/25 11:15:01 AM
Here are the steps in Java:
Open the file using FileReader
You could also wrapped the FileReader with BufferedReader and use readLine() method to read the file line by line
For each line you need to parse it. You know best the data definition of each line, to help you might be able to use various String functions or Java Regex
You could do the same thing for the Date. Check if you could utilize DateFormat
Once you parse the data, you could start building your CSV File using CSVParser mentioned above or write it your own using FileOutputStream
When you are ready to to convert to Excel, you could use Apache POI for Excel
Let me know if you need further clarification
Just to parse through the text file and use CSVParser from apache to write to a csv file. Additionally if you want to write in to excel, you can simply use Apache POI or JXL for that.
You can use SuperCSV to parse the file into a bean and
also to create a csv-file.