Use the Checkstyle API without providing a java.io.File - java

Is there a way to use the Checkstyle API without providing a java.io.File?
Our app already has the file contents in memory (these aren't read from a local file, but from another source), so it
seems inefficent to me to have to create a temporary file and write the in-memory contents to it just to throw it away.
I've looked into using in-memory file systems to circumvent this, but it seems java.io.File is always bound to the
actual file system. Obviously I have no way of testing whether or not performance would be better, just wanted to ask if Checkstyle supports such a use case.

There is no clean way to do this. I recommend creating an issue at Checkstyle expanding more on your process and asking for a way to integrate it with Checkstyle.
Files are needed for our support of caching, as we skip over reading and processing a file if it is in the cache and it has not changed since the last run. The cache process is intertwined which is why no non-file route exists. Even without a file, Checkstyle processes the contents of files through FileText, which again needs a File for just a file name reference and lines of the file in a List.

Related

Delete transferred logs with logstash using sincedb, or use an alternative solution

I want to move log files from a local directory to an elasticsearch client using logstash.
I want to remove the transferred logs (or alternatively alter their name),
in order to keep a reasonable log directory size.
I already understood that there's no built-in functionality in logstash for that, and I wondered if I can use the sincedb file to understand whether the file was completely processed and transferred, because I could also consider writing code which could handle that.
In case it is not possible, I'd could also use a completely different solution instead of logstash.
To sum it up:
Is there a way to understand which files the logstash has finished processing using the sincedb file?
If the answer to the previous question is no, is there another tool which could replace logstash in this case? I don't use any of the logstash's parsing ability, only reading from local directory, and passing it to elasticsearch
The %{path} variable will have the file name from where the current event is read, if that helps.

Automatically Delete Temporary Files using "new" Java File API

After years of coding with the old File API, I'm finally ready to hop onto the whole Path/Paths train. For the most part, this has gone smoothly, however, I'm stumped on this particular aspect: temporary files.
The documentation on java.nio.Files#createTempFile says:
As with the File.createTempFile methods, this method is only part of a temporary-file facility. Where used as a work files, the resulting file may be opened using the DELETE_ON_CLOSE option so that the file is deleted when the appropriate close method is invoked. Alternatively, a shutdown-hook, or the File.deleteOnExit() mechanism may be used to delete the file automatically.
I don't see where the DELETE_ON_CLOSE option is supposed to be specified. Using a shutdown hook is incredibly inconvenient (unless I'm thinking of it wrong). In an effort to avoid using both Path objects and File objects, I am looking for a solution similar to the File.deleteOnExit() for the Path object, but obviously one that doesn't require using Path.toFile().[...].toPath() sort of calling pattern.
What is the correct way to implement "self-destructing" temporary files using the java.nio.Files API?
You set that option when you write, for example:
Path myTempFile = Files.createTempFile(...);
Files.write(myTempFile, ..., StandardOpenOption.DELETE_ON_CLOSE);

How do I append files to a .tar archive in Java?

I would like to create a tar archive in Java. I have files which are constantly being created and I'd like a worker thread to take a reference to those files from a queue and copy them into the archive.
I tried using Apache Compression library's TarArchiveOutputStream to do this, but I do not wish to keep the archive open for the entire duration of the program (since unless i finalize the archive, it can become corrupted - so i'd rather append to it in batches), and I haven't found a good way to append to an existing tar archive with their library (They do have the "ChangeSetPerformer" class, but it basically just creates a new tar and needs to copy the old one entirely, which isn't good for me, performance wise).
I also need the library to not have a low limit for the size of the archive (i.e. 4g or so is not enough), and i'd rather avoid having to actually compress the archive.
Any suggestions would be greatly appreciated!
thank you.
You run here in a limitation of tar: http://en.wikipedia.org/wiki/Tar_(file_format)#Random_access
because of that it is hard to add or remove single files without copying the whole archive.
I use a library called Java Tar: http://www.trustice.com/java/tar/
It's worked for me. In that package, look for:
http://www.gjt.org/javadoc/com/ice/tar/TarArchive.html#writeEntry(com.ice.tar.TarEntry, boolean)
Which lets you add an entry to the file without using a stream at the user level. I don't know about file size - but it would be a simple matter to test this aspect.

Java content APIs for a large number of files

Does anyone know any java libraries (open source) that provides features for handling a large number of files (write/read) from a disk. I am talking about 2-4 millions of files (most of them are pdf and ms docs). it is not a good idea to store all files in a single directory. Instead of re-inventing the wheel, I am hoping that it has been done by many people already.
Features I am looking for
1) Able to write/read files from disk
2) Able to create random directories/sub-directories for new files
2) Provide version/audit (optional)
I was looking at JCR API and it looks promising but it starts with a workspace and not sure what will be the performance when there are many nodes.
Edit: JCP does look pretty good. I'd suggest trying it out to see how it actually does perform for your use-case.
If you're running your system on Windows and noticed a horrible n^2 performance hit at some point, you're probably running up against the performance hit incurred by automatic 8.3 filename generation. Of course, you can disable 8.3 filename generation, but as you pointed out, it would still not be a good idea to store large numbers of files in a single directory.
One common strategy I've seen for handling large numbers of files is to create directories for the first n letters of the filename. For example, document.pdf would be stored in d/o/c/u/m/document.pdf. I don't recall ever seeing a library to do this in Java, but it seems pretty straightforward. If necessary, you can create a database to store the lookup table (mapping keys to the uniformly-distributed random filenames), so you won't have to rebuild your index every time you start up. If you want to get the benefit of automatic deduplication, you could hash each file's content and use that checksum as the filename (but you would also want to add a check so you don't accidentally discard a file whose checksum matches an existing file even though the contents are actually different).
Depending on the sizes of the files, you might also consider storing the files themselves in a database--if you do this, it would be trivial to add versioning, and you wouldn't necessarily have to create random filenames because you could reference them using an auto-generated primary key.
Combine the functionality in the java.io package with your own custom solution.
The java.io package can write and read files from disk and create arbitrary directories or sub-directories for new files. There is no external API required.
The versioning or auditing would have to be provided with your own custom solution. There are many ways to handle this, and you probably have a specific need that needs to be filled. Especially if you're concerned about the performance of an open-source API, it's likely that you will get the best result by simply coding a solution that specifically fits your needs.
It sounds like your module should scan all the files on startup and form an index of everything that's available. Based on the method used for sharing and indexing these files, it can rescan the files every so often or you can code it to receive a message from some central server when a new file or version is available. When someone requests a file or provides a new file, your module will know exactly how it is organized and exactly where to get or put the file within the directory tree.
It seems that it would be far easier to just engineer a solution specific to your needs.

How to safely update a file that has many readers and one writer?

I have a set of files. The set of files is read-only off a NTFS share, thus can have many readers. Each file is updated occasionally by one writer that has write access.
How do I ensure that:
If the write fails, that the previous file is still readable
Readers cannot hold up the single writer
I am using Java and my current solution is for the writer to write to a temporary file, then swap it out with the existing file using File.renameTo(). The problem is on NTFS, renameTo fails if target file already exists, so you have to delete it yourself. But if the writer deletes the target file and then fails (computer crash), I don't have a readable file.
nio's FileLock only work with the same JVM, so it useless to me.
How do I safely update a file with many readers using Java?
According to the JavaDoc:
This file-locking API is intended to
map directly to the native locking
facility of the underlying operating
system. Thus the locks held on a file
should be visible to all programs that
have access to the file, regardless of
the language in which those programs
are written.
I don't know if this is applicable, but if you are running in a pure Vista/Windows Server 2008 solution, I would use TxF (transactional NTFS) and then make sure you open the file handle and perform the file operations by calling the appropriate file APIs through JNI.
If that is not an option, then I think you need to have some sort of service that all clients access which is responsible to coordinate the reading/writing of the file.
On a Unix system, I'd remove the file and then open it for writing. Anybody who had it open for reading would still see the old one, and once they'd all closed it it would vanish from the file system. I don't know if NTFS has similar semantics, although I've heard that it's losely based on BSD's file system so maybe it does.
Something that should always work, no matter what OS etc, is changing your client software.
If this is an option, then you could have a file "settings1.ini" and if you want to change it, you create a file "settings2.ini.wait", then write your stuff to it and then rename it to "settings2.ini" and then delete "settings1.ini".
Your changed client software would simply always check for settings2.ini if it has read settings1.ini last, and vice versa.
This way you have always a working copy.
There might be no need for locking. I am not too familiar with the FS API on Windows, but as NTFS supports both hard links and soft links, AFAIK, you can try this if your setup allows it:
Use a hard or soft link to point to the actual file, and name the file diferently. Let everyone access the file using the link's name.
Write the new file under a different name, in the same folder.
Once it is finished, have the file point to the new file. Optimally, Windows would allow you to create the new link with replacing the existing link in one atomic operation. Then you'd effectively have the link always identify a valid file, either the old or the new one. At worst, you'd have to delete the old one first, then create the link to the new file. In that case, there'd be a short time span in which a program would not be able to locate the file. (Also, Mac OS X offers a "ExchangeObjects" function that allows you to swap two items atomically - maybe Windows offers something similar).
This way, any program that has the old file already opened will continue to access the old one, and you won't get into its way creating the new one. Only if an app then notices the existence of the new version, it could then close the current and open it again, this way getting access to the new version.
I don't know, however, how to create links in Java. Maybe you have to use some native API for that.
I hope this helps anyways.
I have been dealing with something similar recently. If you are running Java 5, perhaps you could consider using NIO file locks in conjunction with a ReentrantReadWriteLock? Make sure all code referencing the FileChannel object ALSO references the ReentrantReadWriteLock. This way the NIO locks it at a per-VM level while the reentrant lock locks it at a per-thread level.
FileLock fileLock = filechannel.lock(position, size, shared);
reentrantReadWriteLock.lock();
// do stuff
fileLock.release();
reentrantReadWriteLock.unlock();
Of course, some exception handling would be required.

Categories

Resources