Detecting newly created files though Java in realtime

Detecting newly created files though Java in realtime - java

Using JDK 7 I've had success in watching specific directories for new file creations, deletions and modifications using java.nio.file.StandardWatchEventKinds.*
I'm hoping someone may know a way to get Java to detect new file creations regardless of their path.
I am wanting to do this so I can calculate an MD5 sum for each newly written file.
Thanks for any advice you can offer.

Ok, short answer is I don't think Java can do that out of the box. You'd have to either intercept calls to the operating system which would require something closer to the bare metal, or you could do as suggested in another answer and register listeners to every folder from the root down, not to mention other drives in the case of windows machines.
The first approach would need custom JNI which assumes the OS has such a hook and allows user code access.
The second approach would work but could consume a large amount of memory to track all the listeners. In windows right-click on c:\ and select and see just how many folders we're talking about.

One possibility - not a convenient one, but a possibility - is to walk the directory tree for the directories you want to watch, registering each in a WatchService. That's not a very nice way to go about it, and it could be a problem depending on how large the actual directory tree is.

I do not know StandardWatchEvents (although it sounds convenient).
One way to do one you want is to use a native window API such as ReadDirectoryChangesW (or volume changes). It's painful, but works (been there, done that, wish I had another option at the time).

Related

Java 7 : Does knowing a file is symbolic link or not helps?

In Java 7 it provides me a way to detect whether a file is symbolic link or not , but why anyone would want to know that .
Files.isSymbolicLink(target) //here target is a path.
I never needed that so far, just wondering what will be the use of it ?

Suppose you're writing a recursive directory copy - you may decide not to follow symbolic links. Or maybe you're creating an archive in a format that doesn't support symbolic links - you may want to warn the user if you encounter one. Or maybe you're writing a diff program, and you want to skip pairs of files which are actually the same file really.
Basically it's a reasonably common property of some files in a file system - why would Java not want to expose that information?

One really good reason you might care about symbolic links is because of security. Sometimes you might want to prevent letting people accessing files from outside a restricted area, so your app checks to make sure the file that is being accessed is not a symbolic link that leads outside of your application's sandbox.
For example, if you're building a network accessible application that runs as a user and that accesses files by path, like maybe a file sharing application, and you want to restrict where people can look for files on your users system, symbolic links could be a security problem.

Java webapplication - properties file nightmare

I recently started working on a POORLY designed and developed web application.. I am finding that it uses about 300 properties files, and all the properties files are being read somewhat like this:
Properties prop= new Properties();
FileInputStream fisSubsSysten = new FileInputStream("whatever.properties");
prop.load(fisSubsSysten);
That is, it is reading the properties files from current working directory.. Another problem is the developers have chosen to use the above lines multiple times within the same java file. For example if there are 10 methods, each method will have the above code instead of having one method and calling it wherever necessary..
This means, we can NEVER change the location of the properties files, currently they are directly under the websphere profiles directory, isn't this ugly? If I move them somewhere else, and set that location in classpath, it does not work.
I tried changing the above lines like this using Spring IO utils library:
Resource resource = new ClassPathResource("whatever.properties");
Properties prop = PropertiesLoaderUtils.loadProperties(resource);
But this application has over 1000 files, and I am finding it impossible to change each file.. How would you go about refactoring this mess? Is there any easy way around?
Thanks!

In these cases of "refactoring" i use a simple find and replace approach. Notepad++ has a " find in files" feature but there are plenty of similar programs.
Create a class which does the properties loading with a method probably with a name parameter for the property file.
This can be a java singleton or a spring bean.
Search and replace all "new Properties()" lines with an empty line.
Replace all "load..." lines with a reference to your new class/ method. Notepad++ supports regex replacement, so you can use the filename as a parameter.
Once this is done go to eclipse and launch a "cleanup" or "organize imports" and fix some compile errors manually if needed.
This approach is quite straight forward and takes no more than 10min if you are lucky or 1 hour if you are unlucky, f.e. the code formatting is way of and each file looks different.
You can make your replace simpler if you format the project once before with a line length of 300 or more so each java statement is on one line. This makes find and replace a bit easier as you dont have newlines to consider.

I can only agree that I find your project a bit daunting, from your reference.
However, the choice of how to maintain or improve of it is a risk that merely needs to be assessed and prioritised.
Consider building a highrise and subsequently realising the bolts that holds the infrastructure have a design flaw. The prospect of replacing them all is indeed daunting as well, so considerations into how to change them and if they really, really needs to be replaced, few, many or all.
I assume it must be a core system for the company, which somebody built and they have probably left the project (?), and you have consideration about improvement or maintaining them. But again, you must assess whether it really is important to move your property files, or if you can just for instance use symbolic links in your file system. Alternatively, do you really need to move them all or is there just a few that would really benefit from being moved. Can you just mark all places in the code with a marker to-be-fixed-later. I sometimes mark bad classes with deprecation, and promise to fix affected classes but postpone until I have other changes in those classes until finally the deprecated class can be safely removed.
Anyway you should assess your options, leave files, replace all or partials, and provide some estimation of cost and consequences, and ask your manager which course to take.
Just note that always overestimate the solution you don't want to do, as you would be twice as likely to stop for coffee breaks, and a billboard of told-you-so's is a great leverage for decision making :)
On the technology side of your question, regex search and replace is probably the only option. I would normally put configuration files in a place accessible by classpath.

You can try using eclipse search feature. For example if you right click on load() method of the properties class and select References -> Project it will give you all location in your project where that method is used.
Also from there maybe you can attempt a global regex search and replace.

Updating a Jar in production

I have a Swing/Java application that is being used for clients and has weekly updates.
The problem is the setup just lay out the classes and their respective directories and on the update I just update the classes.
I want to do a single jar containing all the classes, but I'm not sure how would I be able to update it...
Also some clients need some updates during the week where only one or two classes would be updated.
What is the right way of doing this ?
Im using Eclipse.
EDIT: I know how to create the jar, I just dont know how to dynamically update it.

I would suggest you look into Java WebStart which is designed to do exactly what you need.
You need to first create the finished deployment and then you can look into creating a JNLP file for the deployment, and a way to put all the files to a web server for the user to access. For now just let the user download the whole thing every time you update. When you get more experienced you can look into how you can make incremental updates.
I would strongly recommend including a build number or timestamp in the deployment paths so a jar file is not cached incorrectly in the client by accident.

The general way of doing this, even if only small changes were made, would be to repackage your JAR after each update and give that JAR to a user. The user would replace the existing JAR. How you produce your JAR is up to you. Many IDEs support it, you could write a shell script, use existing build systems like ant or maven or even make, whatever. (See edit below)
If your JAR is very large and deployment is cumbersome, you may be able to split your project into smaller subcomponents, each with their own JAR, and only redistribute the ones containing changes. That may or may not be a major refactoring for you, and it might not even be appropriate. For most small or average size projects, this is generally unnecessary.
As for deployment, there are also a zillion ways to make that easier on the user. You could just give them a JAR file. You could use e.g. install4j. You could use Java Web Start (although its kind of clunky). There are others.
Both install4j and JWS support automatically checking for updates. If you choose to support that feature, all you would need to do is update your distribution site, and users would receive updates automatically. That's also up to you.
But a short answer to your question is: If you have all of your classes packaged in a JAR, no matter how many classes change, you'll want to give the entire updated JAR to the user. The benefit that counters this cost is that a JAR is a nice, compressed, self-contained collection of your application/library's source, which makes management and deployment very convenient.
Edit: Responding to your edit where you specify that you are using Eclipse, Josh M gives you instructions in his comment on your answer. To add to his comment, to export a Runnable Jar you'll have to have a Run Configuration set up which, if you've been running your application in Eclipse already, you probably already have. If not you can create one in the Run menu.
Edit 2: Please refer to Thorbjørn Ravn Andersen's answer as well for some good JWS tips.

Java content APIs for a large number of files

Does anyone know any java libraries (open source) that provides features for handling a large number of files (write/read) from a disk. I am talking about 2-4 millions of files (most of them are pdf and ms docs). it is not a good idea to store all files in a single directory. Instead of re-inventing the wheel, I am hoping that it has been done by many people already.
Features I am looking for
1) Able to write/read files from disk
2) Able to create random directories/sub-directories for new files
2) Provide version/audit (optional)
I was looking at JCR API and it looks promising but it starts with a workspace and not sure what will be the performance when there are many nodes.

Edit: JCP does look pretty good. I'd suggest trying it out to see how it actually does perform for your use-case.
If you're running your system on Windows and noticed a horrible n^2 performance hit at some point, you're probably running up against the performance hit incurred by automatic 8.3 filename generation. Of course, you can disable 8.3 filename generation, but as you pointed out, it would still not be a good idea to store large numbers of files in a single directory.
One common strategy I've seen for handling large numbers of files is to create directories for the first n letters of the filename. For example, document.pdf would be stored in d/o/c/u/m/document.pdf. I don't recall ever seeing a library to do this in Java, but it seems pretty straightforward. If necessary, you can create a database to store the lookup table (mapping keys to the uniformly-distributed random filenames), so you won't have to rebuild your index every time you start up. If you want to get the benefit of automatic deduplication, you could hash each file's content and use that checksum as the filename (but you would also want to add a check so you don't accidentally discard a file whose checksum matches an existing file even though the contents are actually different).
Depending on the sizes of the files, you might also consider storing the files themselves in a database--if you do this, it would be trivial to add versioning, and you wouldn't necessarily have to create random filenames because you could reference them using an auto-generated primary key.

Combine the functionality in the java.io package with your own custom solution.
The java.io package can write and read files from disk and create arbitrary directories or sub-directories for new files. There is no external API required.
The versioning or auditing would have to be provided with your own custom solution. There are many ways to handle this, and you probably have a specific need that needs to be filled. Especially if you're concerned about the performance of an open-source API, it's likely that you will get the best result by simply coding a solution that specifically fits your needs.
It sounds like your module should scan all the files on startup and form an index of everything that's available. Based on the method used for sharing and indexing these files, it can rescan the files every so often or you can code it to receive a message from some central server when a new file or version is available. When someone requests a file or provides a new file, your module will know exactly how it is organized and exactly where to get or put the file within the directory tree.
It seems that it would be far easier to just engineer a solution specific to your needs.

Deleting non-empty directories in Java

Supposing I have a File f that represents a directory, then f.delete() will only delete the directory if it is empty. I've found a couple of examples online that use File.listFiles() or File.list() to get all the files in the directory and then recursively traverses the directory structure and delete all the files. However, since it's possible to create infinitely recursive directory structures (in both Windows and Linux (with symbolic links)) presumably it's possible that programs written in this style might never terminate.
So, is there a better way to write such a program so that it doesn't fall into these pitfalls? Do I need to keep track of everywhere I've traversed and make sure I don't go around in circles or is there a nicer way?
Update: In response to some of the answers (thanks guys!) - I'd rather the code didn't follow symbolic links and stayed within the directory it was supposed to delete. Can I rely on the Commons-IO implementation to do that, even in the Windows case?

If you really want your recursive directory deletion to follow through symbolic links, then I don't think there is any platform independent way of doing so without keeping track of all the directories you have traversed.
However, in pretty much every case I can think of you would just want to delete the actual symbolic link pointing to the directory rather than recursively following through the symbolic link.
If this is the behaviour you want then you can use the FileUtils.deleteDirectory method in Apache Commons IO.

Try Apache Commons IO for a tested implementation.
However, I don't think it this handles the infinite-recursion problem.

File.getCanonicalPath() will tell you the “real” name of the file, including resolved symlinks. When while scanning you come across a directory you alread know (because you stored them in a Map) bail out.

If you could know which files are symlinks, you could just skip over those.
There is unfortunately no "clean" way of detecting symlinks in Java. Check out this pure Java workaround or this one involving native code.

At least under MacOSX, deleting a symbolic link to a directory does not delete the directory itself, and can therefore be deleted even if the target directory is not empty.
I assume this holds for most POSIX operating systems. And as far as I know, links under windows are also just files, and can be deleted as such from a Java program.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.