Writing multiple files with Spring Batch

Writing multiple files with Spring Batch - java

I'm a newbie in Spring Batch, and I would appreciate some help to resolve this situation: I read some files with a MultiResourceItemReader, make some marshalling work, in the ItemProcessor I receive a String and return a Map<String, List<String>>, so my problem is that in the ItemWriter I should iterate the keys of the Map and for each one of them generate a new file containing the value associated with that key, can someone point me out in the right direction in order to create the files?
I'm also using a MultiResourceItemWriter because I need to generates files with a maximum of lines.
Thanks in advance

Well, finaly got a solution, I'm not really excited about it but it's working and I don't have much more time, so I've extended the MultiResourceItemWriter and redefined the "write" method, processing the map's elements and writing the files by myself.
In case anyone out there needs it, here it is.
#Override
public void write(List items) throws Exception {
for (Object o : items) {
//do some processing here
writeFile(anotherObject);
}
private void writeFile (AnotherObject anotherObject) throws IOException {
File file = new File("name.xml");
boolean restarted = file.exists();
FileUtils.setUpOutputFile(file, restarted, true, true);
StringBuffer sb = new StringBuffer();
sb.append(xStream.toXML(anotherObject));
FileOutputStream os = new FileOutputStream(file, true);
BufferedWriter bufferedWriter = new BufferedWriter(new OutputStreamWriter(os, Charset.forName("UTF-8")));
bufferedWriter.write(sb.toString());
bufferedWriter.close();
}
And that's it, I want to believe that there is a better option that I don't know, but for the moment this is my solution. If anyone knows how can I enhance my implementation, I'd like to know it.

Related

Java 11 handle unmodifiable map

I have a class to process some files which is uploaded zipped.
And a method to unzip and fill a HashMap and convert to an Collection.unmodifiableMap.
public class MyClass extends HttpServlet {
...
private Map<String, String> rnaseqfiles = new HashMap<>();
...
private void processZipFile(String zipfile) throws Exception {
String fileName = zipfile;
byte[] buffer = new byte[1024];
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(fileName))) {
ZipEntry zipEntry = zis.getNextEntry();
while (zipEntry != null) {
File newFile = new File(diretorio, zipEntry.toString());
if (zipEntry.isDirectory()) {
if (!newFile.isDirectory() && !newFile.mkdirs()) {
throw new IOException("Failed to create directory " + newFile);
}
} else {
FileOutputStream fos = new FileOutputStream(newFile);
int len;
while ((len = zis.read(buffer)) > 0) {
fos.write(buffer, 0, len);
}
fos.close();
rnaseqfiles.put(zipEntry.toString(), newFile.getAbsolutePath());
}
zipEntry = zis.getNextEntry();
}
rnaseqfiles = Collections.unmodifiableMap(rnaseqfiles);
zis.closeEntry();
zis.close();
}
}
...
}
When I test with a small example, it works nicely, but when I try with the real case I got this kind of error.
java.lang.UnsupportedOperationException
at java.base/java.util.Collections$UnmodifiableMap.put(Collections.java:1457)
I found some hints to deal with it but I don't know exactly what to do.
Any help is appreciated

servlets are quite annoying. Think of the notion that any given servlet is likely going to run many times, and probably many times simultaneously, as various users hit your site.
They are the worst of both worlds: The servlet spec does not guarantee that the system initializes a new object for every request (meaning, it is possible that many different requests, some even simultaneous, are all using the same fields), but it als does not guarantee the opposite either: The system is free to do so.
Conclusion: Fields in servlets are pretty much useless. But you have one, and it's causing troubles: One 'run' overwrites your mutable hashmap with an immutable one, and then the next servlet tries to add stuff to this now immutable map.
The fix is generally to just get rid of servlets. There are better ways to write web apps these days, such as spark, DropWizard, Spring, and many others.
If you insist, then your servlets should not have any fields. If you desire them, your servlet code should simply make a new object and then invoke whatever you want there - your doGet and friends are mostly just oneliners of the form new ActualHandler(req, res).go() or similar. Now you actually have one instance per request.
Or, just.. write the code so that no fields are needed. I don't see why you need a field here, for example. Your current code does;
Receive the request and parse stuff out (you didn't paste this part)
That code evidently invokes processZipFile which returns nothing, but conveys data back using a field. (This does not work in servlets).
Your request handling code then uses that field for stuff.
Seems easy to replace that - don't have a field, have the processZipFile method return that map instead.

How to properly wire together readers and writers for PostgreSQL COPY?

I'm trying to use PostgreSQL COPY command to import data without creating temporary files. Initial data is in a format that requires conversion so I have something like this:
void itemsToCsv(String filename, Writer writer) {
/* Start processing */
CSVPrinter printer = new CSVPrinter(writer, CSVFormat.DEFAULT)
for(Array[String] row: processedFile) {
printer.printRecord(row);
}
}
Now here comes the problem. I was thinking about using something like this code
CopyManager copyMgr = (BaseConnection) conn.getCopyAPI();
Writer w = new PipedWriter();
Reader r = new PipedReader(w);
CopyIn cpin = copyMgr.copyIn("COPY items_import (id, name)FROM STDIN CSV", r);
itemsToCsv("items_in_weird_format.xml", w)
As far as I understand, the copyIn method will block, because there's nothing to read yet. Is there some clever way to wire readers and writers together without resorting to CopyIn.writeToCopy method?
UPD. Solved by wrapping call to itemsToCsv in a thread.

Reading and writing files using Java 7 nio

I have files which consist of json elements in an array.
(several file. each file has json array of elements)
I have a process that knows to take each json element as a line from file and process it.
So I created a small program that reads the JSON array, and then writes the elements to another file.
The output of this utility will be the input of the other process.
I used Java 7 NIO (and gson).
I tried to use as much Java 7 NIO as possible.
Is there any improvement I can do?
What about the filter? Which approach is better?
Thanks,
public class TransformJsonsUsers {
public TransformJsonsUsers() {
}
public static void main(String[] args) throws IOException {
final Gson gson = new Gson();
Path path = Paths.get("C:\\work\\data\\resources\\files");
final Path outputDirectory = Paths
.get("C:\\work\\data\\resources\\files\\output");
DirectoryStream.Filter<Path> filter = new DirectoryStream.Filter<Path>() {
#Override
public boolean accept(Path entry) throws IOException {
// which is better?
// BasicFileAttributeView attView = Files.getFileAttributeView(entry, BasicFileAttributeView.class);
// return attView.readAttributes().isRegularFile();
return !Files.isDirectory(entry);
}
};
DirectoryStream<Path> directoryStream = Files.newDirectoryStream(path, filter);
directoryStream.forEach(new Consumer<Path>() {
#Override
public void accept(Path filePath) {
String fileOutput = outputDirectory.toString() + File.separator + filePath.getFileName();
Path fileOutputPath = Paths.get(fileOutput);
try {
BufferedReader br = Files.newBufferedReader(filePath);
User[] users = gson.fromJson(br, User[].class);
BufferedWriter writer = Files.newBufferedWriter(fileOutputPath, Charset.defaultCharset());
for (User user : users) {
writer.append(gson.toJson(user));
writer.newLine();
}
writer.flush();
} catch (IOException e) {
throw new RuntimeException(filePath.toString(), e);
}
}
});
}
}

There is no point of using Filter if you want to read all the files from the directory. Filter is primarily designed to apply some filter criteria and read a subset of files. Both of them may not have any real difference in over all performance.
If you looking to improve performance, you can try couple different approaches.
Multi-threading
Depending on how many files exists in the directory and how powerful your CPU is, you can apply multi threading to process more than one file at a time
Queuing
Right now you are reading and writing to another file synchronously. You can queue content of the file using Queue and create asynchronous writer.
You can combine both of these approaches as well to improve performance further.

Don't put the I/O into the filter. That's not what it's for. You should get the complete list of files and then process it. For example if the I/O creates another file in the directory, the behaviour is undefined. You might miss a file, or see the new file in the accept() method.

Add comment to an ARFF file

this is my first question in this forum....
I'm making adata-mining application in java with the WEKA API.
I make first a pre-processing stage and when I save the ARFF file i would like to add a couple of lines (as comments) specifing the preprocessing task that i have done to the file...
the problem is that i don't know how to add comments to an ARFF file from the java WEKA API.
To save the file i use the class ArffSaver like this...
try {
ArffSaver saver = new ArffSaver();
saver.setInstances(dataPost);
saver.setFile(arffFile);
saver.writeBatch();
return true;
} catch (IOException ex) {
Logger.getLogger(Preprocesamiento.class.getName()).log(Level.SEVERE, null, ex);
return false;
}
I would be really greatfull if someone could give some idea...
thanks!

You should AVOID writting comments on an .arff file, even more when writting it from Java. These files are very "parser-sensitive". The Weka API to create these files is restrictive for this particular reason.
Even though, you can always add your comments manually with the % symbol. This said, I wouldn't recommend you writting anything more than instances, attributes and values into an .arff file. ;-)

I don't see a reason to not write comments into the header of an ARFF file. The specification clearly says:
Lines that begin with a % are comments.
So while it is technically valid, it can be difficult if you want to use the ArffSaver#setFile method. This method does a lot of (convenient, but somewhat arbitrary and unspecified) work internally, until it finally calls
setDestination(new FileOutputStream(m_outputFile));
If this is not required, the easiest option is to write directly to an OutputStream, which then can simply be set as the destination for the ArffSaver. This can be wrapped in a small helper method, for example, like this:
static void writeArff(
Instances instances,
List<String> commentLines,
OutputStream outputStream) throws IOException
{
ArffSaver saver = new ArffSaver();
saver.setInstances(instances);
if (commentLines != null && !commentLines.isEmpty())
{
BufferedWriter bw = new BufferedWriter(
new OutputStreamWriter(outputStream));
for (String commentLine : commentLines)
{
bw.write("% " + commentLine + "\n");
}
bw.write("\n");
bw.flush();
}
saver.setDestination(outputStream);
saver.writeBatch();
}
When calling it like this
List<String> comments = Arrays.asList("A comment", "Another one");
writeArff(instances, comments, outputStream);
then the given comments will be inserted at the top of the ARFF file.

Update only the new content in a file all five minutes

I get a file personHashMap.ser with a HashMap in it. Here's the code how i create it:
String file_path = ("//releasearea/ToolReleaseArea/user/personHashMap.ser");
public void createFile(Map<String, String> newContent) {
try{
File file = new File(file_path);
FileOutputStream fos=new FileOutputStream(file);
ObjectOutputStream oos=new ObjectOutputStream(fos);
oos.writeObject(newContent);
oos.flush();
oos.close();
fos.close();
}catch (Exception e){
System.err.println("Error in FileWrite: " + e.getMessage());
}
}
Now i want, when the program is running, that all five minutes update the file personHashMap.ser only with the content which changed. So the method i called:
public void updateFile(Map<String, String> newContent) {
Map<String, String> oldLdapContent = readFile();
if(!oldLdapContent.equals(ldapContent)){ // they arent the same,
// so i must update the file
}
}
But now i haven't any ideas how i can realise that.
And is it better for the performance to update only the new content or should i clean the full file and insert the new list again?
Hope you can Help me..
EDIT:
The HashMap includes i.e street=Example Street.
But now, the new street called New Example Street. Now i must update the HashMap in the File. So i can't just append the new content...

Firstly HashMap isn't really an appropriate choice. It's designed for in-memory usage, not serialization (though of course it can be serialized in the standard way). But if it's just 2kb, then go ahead and write the whole thing rather than the updated data.
Second, you seem to be overly worried about performance of this rather trivial method (for 2kb the write will take mere milliseconds). I would be worried more about consistency and concurrency issues. I suggest you look into using a lightweight database such as JavaDB or h2.

Use the constructor FileOutputStream(File file, boolean append), set the boolean append to true. It will append the text in the existing file.

You can call the updateFile method in a loop and then call sleep for 5 minutes (5*60*1000 ms).
Thread.Sleep(300000); // sleep for 5 minutes
To append to your already existing file you can use :
FileOutputStream fooStream = new FileOutputStream(file, true);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.