Spring Integration: Allocate Space when sending file to FTP

Spring Integration: Allocate Space when sending file to FTP - java

I have the following problem:
We are sending files to a FTP. We hadn't been having problems since we were sending files of size < 5 MB. If the file size is greater than 5 MB, then we get an abend (an abnormal end), getting this error :
In order to "solve" this issue, we should allocate space before sending the file to the FTP, doing something like this:
QUOTE SITE BLOCKSIZE=0 LRECL=256 WRAP UNIT=DISK RECFM=VB PRI=50 SEC=50 CYL
Currently I'm using DefaultFtpSessionFactory along with a FileTransferringMessageHandler in order to send files to the FTP (obviously it works well unless the file is > 5 MB).
My question is: Is there a way to solve this issue using Spring?

I didn't tried that, but look. You can extend DefaultFtpSessionFactory and override its postProcessClientAfterConnect().
Then you can try to perform
FtpClient.sendSiteCommand("QUOTE");
FtpClient.sendSiteCommand("SITE");
FtpClient.sendSiteCommand("BLOCKSIZE=0");
FtpClient.sendSiteCommand("LRECL=256");
and so on until the end of your command.
You can check here also.

Related

Merging PDFs with Sejda fails with stream output

Using Sejda 1.0.0.RELEASE, I basically followed the tutorial for splitting a PDF but tried merging instead (org.sejda.impl.itext5.MergeTask, MergeParameters, ...). All works great with the FileTaskOutput:
parameters.setOutput(new FileTaskOutput(new File("/some/path/merged.pdf")));
However I am unable to change this to StreamTaskOutput correctly:
OutputStream os = new FileOutputStream("/some/path/merged.pdf");
parameters.setOutput(new StreamTaskOutput(os));
parameters.setOutputName("merged.pdf");
No error is reported, but the resulting file cannot be read by Preview.app and is approximately 31 kB smaller (out of the ~1.2 MB total result) than the file saved above.
My first idea was: stream is not being closed properly! So I added os.close(); to the end of CompletionListener, still the same problem.
Remarks:
The reason I need to use StreamTaskOutput is that this merge logic will live in a web app, and the merged PDF will be sent directly over HTTP. I could store the temporary file and serve that one, but that is a hack.
Due to licencing issues, I cannot use the iText 5 version of the task.
Edit
Turns out, the reason is that StreamTaskOutput zips the result into a ZIP file! OutputWriterHelper.copyToStream() is the culprit. If I rename merged.pdf to merged.zip, it's a valid ZIP file containing a perfectly valid merged.pdf file!
Could anyone (dear authors of the library) comment on why this is happening?

The idea is that when a task consumes a MultipleOutputTaskParameters producing multiple output documents, the StreamTaskOutput has to group them to be able to write all of them to a stream output. Unfortunately Sejda currently applies the same logic to SingleOutputTaskParameters, hence your issue. We can fix this in Sejda 2.0 because it makes more sense to directly stream the out document in case of SingleOutputTaskParameters. For Sejda 1.x I'm not sure how to address this remaining compatible with the existing behaviour.

Weka CSV loader limit

I'm using Weka for a sentiment analysis project i'm working on. I'm using Weka CSV Loader to load the training instances from a csv file, but for some reason if i want to load more than 70 instances, the program gives me an "java.lang.ArrayIndexOutOfBoundsException: 2" exception. I found that u can give options to Weka CSV Loader
-B
The size of the in memory buffer (in rows).
(default: 100)
this one beeing maybe the one i need to set, to get rid of this error, but i'm not sure how to do this from a Java project. If anyone can help me with this, i would appreciate it greatly
UPDATE: The buffer size change didn't help the problems comes from somewhere else
How i'm using the loader:
private void getTrainingDataset(final String INPUT_FILENAME)
{
try{
//reading the training dataset from CSV file
CSVLoader trainingLoader =new CSVLoader();
trainingLoader.setSource(new File(INPUT_FILENAME));
inputDataset = trainingLoader.getDataSet();
}catch(IOException ex)
{
System.out.println("Exception in getTrainingDataset Method");
}
}
UPDATE: for those who want to know where the exception occurs
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at weka.core.converters.CSVLoader.getInstance(CSVLoader.java:1251)
at weka.core.converters.CSVLoader.readData(CSVLoader.java:866)
at weka.core.converters.CSVLoader.readHeader(CSVLoader.java:1150)
at weka.core.converters.CSVLoader.getStructure(CSVLoader.java:924)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:836)
at sentimentanalysis.SentimentAnalysis.getTrainingDataset(SentimentAnalysis.java:209)
at sentimentanalysis.SentimentAnalysis.trainClassifier(SentimentAnalysis.java:134)
at sentimentanalysis.SentimentAnalysis.main(SentimentAnalysis.java:282)
UPDATE: Even for under 70 instances, after a few, the Classifier also gives an error. Everything works fine for like 10-20 instances but it all goes to shit for more :)

Weka read CSV two times, first pass limited to buffersize (in rows) to extract classes of nominal attributes, the second pass read the entire file.
the classes of each nominal attribute much match the classes of the training set (no more, no less).
increase the value of the buffersize to more than the number of rows
if still an error occurs then look for a class that it is not in the both files.

Mule originalFilename is null

I am creating a process using Mule 3.4.1 which after processing a file it writes out the file with a specific filename.
The input filename is: MMDDYYYY_sys_newhires.csv
The processed filename is: MMDDYYYY_sys_newhires_NNN.csv
The code that i am using is below:
#[filename = message.inboundProperties.originalFilename;
filename= com.cfa.apps.icims.mule.CounterSingleton.getInstance().getCount()
+ filename.substring(0,filename.length() -1 -4) + ".csv";
filename]
The problem exists in the first line. message.inboundProperties.originalFilename.
I have tried a number of different combinations
message.inboundProperties.originalFilename
message.inboundProperties['originalFilename']
message.inboundProperties.originalFileName
message.inboundProperties['originalFileName']
message.inboundProperties.sourceFilename
message.inboundProperties['sourceFilename']
message.inboundProperties.sourceFileName
message.inboundProperties['sourceFileName']
Now I have also tried nesting the #[header:originalFilename], this works by itself, but you can't nest the expression within the code at least as far as I know.
Any help?
UPDATE: I am using the inbound file transport

Since you don't show the endpoint configuration, I'm going to assume that this is happening with a file inbound endpoint.
For a reason that goes beyond imagination, the file message receiver behind the inbound endpoint puts the originalFilename property in the outbound scope when evaluating the expression to generate the archived file name.
So use: message.outboundProperties.originalFilename

I have tested this mel EXPRESSION in ESB 3.7.1 and it is working fine!!
message.inboundProperties.originalFilename=='firstfile.txt'

Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException

I have a Google App Engine App that converts CSV to XML files. It works fine for small XML inputs, but refuses to finalize the file for larger inputed XML. The XML is read from, and the resulting csv files are written to, many times before finalization, over a long-running (multi-day duration) task. My problem is different than this FileServiceFactory getBlobKey throws IllegalArgumentException , since my code works fine both in production and development with small input files. So it's not that I'm neglecting to write to the file before closing/finalizing. However, when I attempt to read from a larger XML file. The input XML file is ~150MB, and the resulting set of 5 CSV files is each much smaller (perhaps 10MB each). I persisted the file urls for the new csv files, and even tried to close them with some static code, but I just reproduce the same error, which is
java.lang.IllegalArgumentException: creation_handle: String properties must be 500 characters or less. Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedSingleValue(DataTypeUtils.java:242)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:207)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:173)
at com.google.appengine.api.datastore.Query$FilterPredicate.<init>(Query.java:900)
at com.google.appengine.api.datastore.Query$FilterOperator.of(Query.java:75)
at com.google.appengine.api.datastore.Query.addFilter(Query.java:351)
at com.google.appengine.api.files.FileServiceImpl.getBlobKey(FileServiceImpl.java:329)
But I know that it's not a String/Text data type issue, since I am already using similar length file service urls for the previous successful attempts with smaller files. It also wasn't an issue for the other stackoverflow post I linked above. I also tried putting one last meaningless write before finalizing, just in case it would help as it did for the other post, but it made no difference. So there's really no way for me to debug this... Here is my file closing code that is not working. It's pretty similar to the Google how-to example at http://developers.google.com/appengine/docs/java/blobstore/overview#Writing_Files_to_the_Blobstore .
log.info("closing out file 1");
try {
//locked set to true
FileWriteChannel fwc1 = fileService.openWriteChannel(csvFile1, true);
fwc1.closeFinally();
} catch (IOException ioe) {ioe.printStackTrace();}
// You can't get the blob key until the file is finalized
BlobKey blobKeyCSV1 = fileService.getBlobKey(csvFile1);
log.info("csv blob storage key is:" + blobKeyCSV1.getKeyString());
csvUrls[i-1] = blobKeyCSV1.getKeyString();
break;
At this point, I just want to finalize my new blob files for which I have the urls, but cannot. How can I get around this issue, and also, what may be the cause? Again, my code works for small files (~60 kB), but the input file of ~150MB fails). Thank you for any advice on what is causing this or how to get around it! Also, how long will my unfinalized files stick around for, before being deleted?

This issue was a bug in the Java MapReduce and Files API, which was recently fixed by Google. Read announcement here: groups.google.com/forum/#!topic/google-appengine/NmjYYLuSizo

GZip a string for output from Coldfusion results in "Content Encoding Error" in browsers

I am trying to GZip content in a variable to output to the browser. To start I am making this very simple and not worrying about browsers that do not support gzip. Also I have put this together from researching several methods that I could find on the web. Some of them from people that may be reading this question.
<cfsavecontent variable="toGZIP"><html><head><title>Test</title></head><body><h1>Fear my test</h1></body></html></cfsavecontent>
<cfscript>
ioOutput = CreateObject("java","java.io.ByteArrayOutputStream");
gzOutput = CreateObject("java","java.util.zip.GZIPOutputStream");
ioOutput.init();
gzOutput.init(ioOutput);
gzOutput.write(toGZIP.getBytes("UTF-8"), 0, Len(toGZIP.getBytes()));
gzOutput.finish();
gzOutput.close();
ioOutput.flush();
ioOutput.close();
toOutput=ioOutput.toString("UTF-8");
</cfscript>
<cfcontent reset="yes" /><cfheader name="Content-Encoding" value="gzip"><cfheader name="Content-Length" value="#ArrayLen( toOuptut.getBytes() )#" ><cfoutput>#toOuptut#</cfoutput><cfabort />
But I get an error in Firefox (and chrome and Safari)
Content Encoding Error
The page you are trying to view cannot be shown because it uses an invalid or unsupported form of compression.
Anybody have any ideas?
OS: Mac OX-X Snow Leopard
CF: 9-Dev
Webserver: Apache
SOLUTION
<cfsavecontent variable="toGZIP"><html><head><title>Test</title></head><body><h1>Fear my test</h1></body></html></cfsavecontent>
<cfscript>
ioOutput = CreateObject("java","java.io.ByteArrayOutputStream");
gzOutput = CreateObject("java","java.util.zip.GZIPOutputStream");
ioOutput.init();
gzOutput.init(ioOutput);
gzOutput.write(toGZIP.getBytes(), 0, Len(toGZIP.getBytes()));
gzOutput.finish();
gzOutput.close();
ioOutput.flush();
ioOutput.close();
toOutput=ioOutput.toByteArray();
</cfscript>
<cfheader name="Content-Encoding" value="gzip"><cfheader name="Content-Length" value="#ArrayLen(toOutput)#" ><cfcontent reset="yes" variable="#toOutput#" /><cfabort />

The follow line look completely wrong:
toOutput=ioOutput.toString("UTF-8");
You encode the GZip stream with UTF8. The result are garbage data. The best you set the GZip data as binary if ColdFusion has the option. If you can only set a string then you need an encoding that not change any bytes. For example iso1.

Is there a reason you're doing it manually over letting the web server (IIS or Apache) handle this? Both of them support GZip encoding, and will probably do so faster and better than your manual process.
Enabling GZip in IIS6
Enabling GZip in IIS7
Enabling GZip in Apache2

Please note that you have syntax error in the code: toOuptut instead of toOutput.
Unfortunately, I'm not a Java expert and can't say what exactly is wrong. But when I try to save the contents into the file using wget, it contains not zipped binary, but source HTML. It can mean that gzOutput-related processing does not produce correct output.
BTW, verifying the browser support of GZip is pretty simple. You can check the Accept-Encoding header, like this:
<cfif FindNoCase("gzip", cgi.HTTP_ACCEPT_ENCODING)>
<!--- prepare the gzipped text --->
</cfif>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring Integration: Allocate Space when sending file to FTP - java

Related

Merging PDFs with Sejda fails with stream output

Weka CSV loader limit

Mule originalFilename is null

Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException

GZip a string for output from Coldfusion results in "Content Encoding Error" in browsers

Categories

Resources