I'm writing a web service using Axis and Apache Tomcat 7.
The service uses a third party library to do some conversions to a file and ends up creating a folder which contains more files (subfolders and regular files). When the conversion is completed the service creates a zip archive and returns it.
When it receives a new request, first of all it removes the files created during the last request and the it starts handling the request.
The service itself works fine, at least the first request is satisfied.
The problem is that when a second request is received, the service cannot delete all the files generated during the last request.
I'm using Windows XP and with Process Explorer i see that Tomcat is keeping some files (bt not all of them) open and that's why i can't delete it.
Is that possibile that the library i'm using keeps the files open even when the service operation ends?
In the code that i use to create the zip archive it seems that i close all the streams that i open. Btw even if i forgive to close them, can they stay still open after the service operation returns his results to the client?
And if so, why the process Tomcat keeps open only some of the files?
It seems that after some time some file are "released", but other file are always kept open...
I hope someone can give me some advice on how to handle this situation :)
Repost of my comment which seems to be useful.
If a file handler is not released, it will never be released until the servlet container is shutdown. Some implementations may also delay the releasing of file handlers to when the object is garbage collected. Nothing you can do except to make sure that you close all handlers. If it's the third party libary, then you have to report a bug or fix it yourself.
My best practice to prevent this sort of problem is to make sure that the file handler is closed in the same method it is opened. If it is not opened in that method, never close it.
public void method() {
//open file handler
//do something
//close file handler. make sure it is closed even if there is an exception.
}
And never make file handler a field.
public class A {
private FileInputStream fin = null; // never do this. you will have hard time keeping track of when to release it.
}
Well, as I was wondering, the library I'm using doesn't close all the file streams..
I found a workaround that can be used in similar situations. I post it, maybe someone can find it useful or advise me a better way to solve the problem :)
Lucky the jar library can be executed from command line with the java command, so i just executed it with Runtime's exec method in this way:
try {
Process p = Runtime.getRuntime().exec("java -jar library.jar parameters");
p.waitFor();
} catch (InterruptedException | IOException e) {
e.printStackTrace();
}
In this way a new process is created by the JVM, and, when it dies, all the pending handlers are relased.
If you can't execute the library with the java command, such library can be simply wrapped in a custom jar made by a simple class who uses it and takes the needed parameters. The wrapper jar can be then executed with exec().
Related
Note: being asked here because I guarantee the Security stack will close it for being a programming question.
I have a web application (in this case, Java on Tomcat) for which I occasionally need to allow the user to upload files. Even though I generally have trustworthy users, in my business we assume anybody and everybody could potentially be an insider threat (or just plain dumb). Therefore, I would like to have the uploaded file go directly to a "quarantine" directory, programmatically fire off a scan, and only if the scan succeeds, copy it to the intended destination folder for processing.
The only fly in the ointment is figuring out (a) how to fire off a scan, on demand, programmatically (let's assume we're using the McAfee suite of tools) and (b) how to get notification back when the scan is complete. Is it possible? If so, has anyone done it and can give me pointers?
We do this. We have a queue system so workers can pickup the file operations and perform them async. but The general flow is to scan the file using a command, and update the database to track status.
write the file to a dir
note the file information in a database with location=x; scanned=no;
read the docs for mcaffee, but there should be a way to run a scan via the command line or via an SDK. I'd probably run it via command line to scan the file, and assume the command will return some information (0 or !=0 on error or bad results)
If the file scanner returns non-zero, then set scanned=infected;
if the file scanner returns clean, then set scanned=clean;
Set the processing code to only process files that are scanned=clean;
Note: #David Conrad found the instructions for running the command line scanner https://kc.mcafee.com/corporate/index?page=content&id=KB75478 ; upvote that guy.
Every few days, I get a SocketException, too many files open. I tracked the issue to a temporary pdf file that is being used during a certain process. The process passes a name of a temporary file that the library creates. At some point, the library opens an input stream but doesn't close it. Given that my code only has the name of the file, is there any way for me to close the stream?
Details:
Java Web App running in Tomcat6
The best approach is to request a version of this library with this bug fixed.
If this is not possible, get the sources, fix the bug yourself.
If you can't (only a binary jar file), try a tool like jd-gui, decompile the faulty class, fix, recompile that class and replace the .class in the jar.
If it still does not work use ASM and add a close statement at the right place. THIS SOLUTION SHOULD BE AVOIDED. It's complex if you do not master this technology.
I'm fairly new to Apache NIFI.
I'd like to set up a flow, where there is a file that gets put into a 'hot folder'. If this folder detects a file put into it, this file then gets put into another folder called 'input'. Once the file is copied into the input folder, I'd like trigger off a Java Program to run.
The way i've approached this is to create a 'GETFILE' processor to get the file from the hot folder. and then create a PUTFILE processor to put it in the input folder. So you can imagine there being a connection link between the 'GETFILE' and 'PUTFILE' processors. This works as expected.
However the challenge I'm faced with, is to trigger my Java process to run when the file is copied into the INPUT folder (i.e. after the PUTFILE processor has been executed). I cannot create a link between the PUTFILE and the EXECUTEPROCESS processor (as a means of telling NIFI to run the Java process after the file is copied from the hot folder to the input folder). I can't seem to get the connection arrow to link between the PUTFILE and the EXECUTEPROCESS processors (as NIFI won't let me).
Based on the above description, is there anyone that can recommend an approach to tell NIFI to trigger the Java application to run after detecting the file being added to the input folder?
Thanks.
What you are looking to do makes a lot of sense and we actually used to allow something similar with that processor. It turned out though that there were enough edge cases that it became rather problematic to decide what to do with the input flow file so we have a current very explicit model which basically means that processor combined with cron-scheduling is a fancy cron-tool.
So, what we have moved to instead is coming out in NiFi 0.5.0 release which should be in a matter of days. In that we have https://issues.apache.org/jira/browse/NIFI-210 which is a really exciting feature to allow scripting to occur against the flow in-line. The ExecuteScript processor sounds perfect for your case. If you run this code for example you can have it triggered on the presence of data and can wait to listen for the output and capture it as flow file attributes. You could then even route on the content of the response, etc..
def flowFile = session.get()
if (flowFile == null) {
return;
}
def procout = new StringBuffer(512), procerr = new StringBuffer(512)
def proc = "java -version".execute()
proc.consumeProcessOutput(procout, procerr)
proc.waitForOrKill(1000)
flowFile = session.putAttribute(flowFile, "Process Output", procout.toString())
flowFile = session.putAttribute(flowFile, "Process Error", procerr.toString())
session.transfer(flowFile, REL_SUCCESS)
Let us know if you have more questions.
Thanks
Joe
I am polling file system for new file, which is upload by someone from web interface.
Now I have to process every new file, but before that I want to insure that the file I am processing is complete (I mean to say it is completely transferred through web interface).
How do I verify if file is complete downloaded or not before processing?
Renaming a filename is an atomic action in most (if not all) filesystems. You can make use of this by uploading the file to a recognizable temporary name and renaming it as soon as the upload is complete.
This way you will "see" only those files that have been uploaded completely and are safe for processing.
rsp's answer is very good. If, by any chance, it does not work for you, and if your polling code is running within a process different from the process of the web server which is saving the file, you might want to try the following:
Usually, when a file is being saved, the sharing options are "allow anyone to read" and "allow no-one to write". (exclusive write.) Therefore, you can attempt to open the file also with exclusive write access: if this fails, then you know that the web server is still holding the file open, and writing to it. If it succeeds, then you know that the web server is done. Of course be sure to try it, because I cannot guarantee that this is precisely how the web server chooses to lock the file.
I'm currently using commons-net library for FTP client in my app. I have to download from remote server some files, by some criteria based on the file name. This is a very simplified and reduced version of my actual code (because I do some checks and catch all possible exceptions), but the essence is there:
//ftp is FTPClient object
//...
files = ftp.listFiles();
for (FTPFile ftpFile : files) {
String name = ftpFile.getName();
if(conformsCriteria(name)) {
String path = outDirectory + File.separatorChar + name;
os = new FileOutputStream(path);
ftp.retrieveFile(name, os);
}
}
Now, what I noticed is that when I run this code, wait a few seconds, and then plug out network cable, output directory contains some "empty" files plus the files actually downloaded, which leads me to believe that this method is working somewhat asynchronously... But then again, some files are downloaded (size > 0KB), and there are these empty files (size = 0KB), which leads me to believe that it is still serialized download... Also, function retrieveFile() returns, I quote documentation:
True if successfully completetd, false if not
What I need is serialized download, because I need to log every unsuccessful download.
What I saw browsing through the commons-net source is that, if I'm not wrong, new Socket is created for each retrieveFile() call.
I'm pretty confused about this, so If someone could explain what is actually happening, and offer solution with this library, or recommend some other FTP java library that supports blocking download per file, that would be nice.
Thanks.
You could just use the java.net.URLConnection class that has been present forever. It should know how to handle FTP URLs just fine. Here is a simple example that should give the blocking behavior that you are looking for.
The caveat is that you have to manage the input/output streams yourself, but this should be pretty simple.
Ok, to briefly answer this in order not to confuse people who might see this question.
Yes, commons-net for FTP is working as I thought it would, that is, retrieveFile() method blocks until it's finished with the download.
It was (of course) my own "mistake" in the code that let me think otherwise.