Uniquely identify file in Java

Uniquely identify file in Java - java

Im on Linux and my Java application is not intended to be portable.
I'm looking for a way to identify a file uniquely in Java. I can make use of statfs syscall since the pair (f_fsid, ino) uniquely identifies a file (not only across a file system) as specified here: http://man7.org/linux/man-pages/man2/statfs.2.html
The question is if it is possible extract fsid from Java directly so I can avoid writing JNI function?
inode can be extracted with NIO, but how about fsid? inode and fsid comes from different structure and are operated by different syscalls...

This java example demonstrates how to get the unix inode number of a file.
import java.nio.file.*;
import java.nio.file.attribute.*;
public class MyFile {
public static void main(String[] args) throws Exception {
BasicFileAttributes attr = null;
Path path = Paths.get("MyFile.java");
attr = Files.readAttributes(path, BasicFileAttributes.class);
Object fileKey = attr.fileKey();
String s = fileKey.toString();
String inode = s.substring(s.indexOf("ino=") + 4, s.indexOf(")"));
System.out.println("Inode: " + inode);
}
}
The output
$ java MyFile
Inode: 664938
$ ls -i MyFile.java
664938 MyFile.java
credit where credit is due: https://www.javacodex.com/More-Examples/1/8

I would suggest the GIT method of hashing the file contents. This is proof against copying and renaming.
Java is supposed to be platform independent so using Unix specific methods may not be what you want.

Related

TrueZip - How to get Size of a Folder within an Zip Archive

is there a way to get the Size of a folder with TrueZip without doing it
myself recursively? I'm concerned about the runtime since I'm dealing with Archives that contain lots of files.
Using TrueZip 7.7.9

OK, here is a simple test using only the standard Java 7 API; it uses the bundled JSR 203 implementation for zips:
public final class Test
{
public static void main(final String... args)
throws IOException
{
final Path zip = Paths.get("/home/fge/t.zip");
final URI uri = URI.create("jar:" + zip.toUri());
try (
final FileSystem fs = FileSystems.newFileSystem(uri, Collections.emptyMap());
final DirectoryStream<Path> stream = Files.newDirectoryStream(fs.getPath("/"));
) {
for (final Path entry: stream)
System.out.println(Files.size(entry));
}
}
}
The zip above only contains files at the top level, not directories.
This means that you can use this API to correctly compute the size of files in a zip; without the need to uncompress.
You use Java 7; therefore, what you want to do is probably to use a FileVisitor which computes the size of all regular files for you.
Here I have made a crude hack which uses Files.size(); note that in a FileVisitor, when you visit a filesystem entry which is not a directory, you have an instance of BasicFileAttributes coming along from which you can retrieve the size().
Javadoc here; and use Files.walkFileTree().
With Java 8, this is much simpler, you could use Files.find() instead.

How to use Files.getFileStore() for substed drive (on Windows)?

When invoking Files.getFileStore() on a substed drive (on Windows), this results in following error:
The directory is not a subdirectory of the root directory
For example with:
subst P: C:\temp
running:
public static void main(String[] args) throws IOException {
final Path dir = Paths.get("P:/sub");
final FileStore fileStore = Files.getFileStore(dir);
fileStore.isReadOnly();
}
results in:
Exception in thread "main" java.nio.file.FileSystemException: P:\sub: The directory is not a subdirectory of the root directory.
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at sun.nio.fs.WindowsFileStore.create(WindowsFileStore.java:92)
at sun.nio.fs.WindowsFileSystemProvider.getFileStore(WindowsFileSystemProvider.java:482)
at java.nio.file.Files.getFileStore(Files.java:1411)
at utils.FileStoreMain.main(FileStoreMain.java:16)
How to fix this problem and receive the appropriate FileStore for P:?

Have a look at this bug report JDK-8034057 and at a related answer from Alan Bateman.

The problem is that a "substed drive" is not a file store; it just associates a drive letter with a path on an existing drive.
You did:
subst p: c:\temp
which means, in fact, that the real filestore of your p:\sub is the drive associated with c:.
Note: that's just a hypothesis, I don't actually run windows. But if you try and iterate over the filestores (ie, by calling .getFileSystem().getFileStores() on your Path instance) then P: will not appear.
Now, the question remains as to how to obtain the real filestore, if it's possible at all. Maybe a FileAttributeView exists which can provide you with this information; try and see what attribute views are available to you, and their parameters, by using this code:
// using some Path instance named path...
final FileSystem fs = path.getFileSystem();
final Set<String> viewNames = fs.supportedFileAttributesView();
for (final String viewName: viewNames) {
System.out.println("View " + viewName + ':');
System.out.println(Files.readAttributes(path, viewName + ":*"));
}
Maybe there exists a view with the information you are looking for... No guarantee though.

How to efficiently test if files with a matching filename (regex or wildcard) exists in a directory?

I am searching for an efficient way to test if files exists which have a file-name of a certain pattern.
Examples using wildcards:
????.*
???????.*
*.png
*.jpg
Examples using regular expressions:
[012]{4}.*
[012]{7}.*
The problem is that the directory I have to test contains up to 500.000 files.
The only way I know to perform such tests is to use the methods of the File class:
String[] list()
String[] list(FilenameFilter filter)
File[] listFiles()
File[] listFiles(FileFilter filter)
File[] listFiles(FilenameFilter filter)
The problem is that basically they are all implemented the same way: First the call list() for getting all available files and the they apply the filter on it.
Please imagine yourself what happens if we want to apply this on a folder containing 500.000 files...
If there any alternative in Java for retrieving the filename of the first matching file regarding files in a directory without having to enumerate all of them?
If JNI is the only option - is there a library can do this that comes with pre-compiled binaries for the six major platforms (Linux, Windows and OSX each 32 and 64 bit)?

I think that you are confused. As far as I know, no current OS supports pattern listing/searching in its filesystem interface. All utilities that support patterns do so by listing the directory (e.g. by using readdir() on POSIX systems) and then performing string matching.
Therefore, there is no generic low-level way to do that more efficiently in Java or any other language. That said, you should investigate at least the following approaches:
making sure that you only retrieve the file names and that you do not probe the file nodes themselves for additional metadata (e.g. their size), as that would cause additional operations for each file.
retrieving the file list once and caching it, perhaps in association with a filesystem event notification interface for updates (e.g. JNotify or the Java 7 WatchService interface).
EDIT:
I had a look at my Java implementation. The only obvious drawback in the methods of the File class is that listing a directory does not stop once a match is found. That would only matter, however, if you only perform the search once - otherwise it would still be far more efficient to cache the full directory list.
If you can use a relatively recent Java version, you might want to have a look at the Java NIO classes (1, 2) which do not seem to have the same weakness.

this takes about 1 minute on my machine (which is sorta old)
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Main {
static void match(File dir, Pattern pattern, List<File> matching) {
File[] files = dir.listFiles();
if(files==null) {
System.out.println(dir + " is strange!");
return;
}
for (File file : files)
if (file.isDirectory()) match(file, pattern, matching);
else if (file.isFile()) {
Matcher matcher = pattern.matcher(file.getName());
if (matcher.matches()) {
matching.add(file);
//System.out.println(file + "************");
}
}
}
static void makeFiles(File dir,int n) throws IOException {
for(int i=0;i<n;i++) {
File file=new File(dir,i+".foo");
FileWriter fw=new FileWriter(file);
fw.write(1);
fw.close();
}
}
public static void main(String[] args) throws IOException {
File dir = new File("data");
final int n=500000;
//makeFiles(dir,n);
long t0=System.currentTimeMillis();
Pattern pattern = Pattern.compile(".*\\.foo");
List<File> matching = new LinkedList<File>();
match(dir, pattern, matching);
long t1=System.currentTimeMillis();
System.out.println("found: "+matching.size());
System.out.println("elapsed time: "+(t1-t0)/1000.);
System.out.println("files/second: "+n/((t1-t0)/1000.));
}
}

I think you are putting the proverbial cart before the horse.
As Knuth said, premature optimization is the root of all evil. Have you tried using the FileFilter method and found that it is too slow for the application?
Why do you have so many files in one folder? Perhaps the more beneficial approach would be to split those files up in some manner instead of having them all in one folder.

How to convert a Hadoop Path object into a Java File object

Is there a way to change a valid and existing Hadoop Path object into a useful Java File object. Is there a nice way of doing this or do I need to bludgeon to code into submission? The more obvious approaches don't work, and it seems like it would be a common bit of code
void func(Path p) {
if (p.isAbsolute()) {
File f = new File(p.toURI());
}
}
This doesn't work because Path::toURI() returns the "hdfs" identifier and Java's File(URI uri) constructor only recognizes the "file" identifier.
Is there a way to get Path and File to work together?
**
Ok, how about a specific limited example.
Path[] paths = DistributedCache.getLocalCacheFiles(job);
DistributedCache is supposed to provide a localized copy of a file, but it returns a Path. I assume that DistributedCache make a local copy of the file, where they are on the same disk. Given this limited example, where hdfs is hopefully not in the equation, is there a way for me to reliably convert a Path into a File?
**

I recently had this same question, and there really is a way to get a file from a path, but it requires downloading the file temporarily. Obviously, this won't be suitable for many tasks, but if time and space aren't essential for you, and you just need something to work using files from Hadoop, do something like the following:
import java.io.File;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public final class PathToFileConverter {
public static File makeFileFromPath(Path some_path, Configuration conf) throws IOException {
FileSystem fs = FileSystem.get(some_path.toUri(), conf);
File temp_data_file = File.createTempFile(some_path.getName(), "");
temp_data_file.deleteOnExit();
fs.copyToLocalFile(some_path, new Path(temp_data_file.getAbsolutePath()));
return temp_data_file;
}
}

If you get a LocalFileSystem
final LocalFileSystem localFileSystem = FileSystem.getLocal(configuration);
You can pass your hadoop Path object to localFileSystem.pathToFile
final File localFile = localFileSystem.pathToFile(<your hadoop Path>);

Not that I'm aware of.
To my understanding, a Path in Hadoop represents an identifier for a node in their distributed filesystem. This is a different abstraction from a java.io.File, which represents a node on the local filesystem. It's unlikely that a Path could even have a File representation that would behave equivalently, because the underlying models are fundamentally different.
Hence the lack of translation. I presume by your assertion that File objects are "[more] useful", you want an object of this class in order to use existing library methods? For the reasons above, this isn't going to work very well. If it's your own library, you could rewrite it to work cleanly with Hadoop Paths and then convert any Files into Path objects (this direction works as Paths are a strict superset of Files). If it's a third party library then you're out of luck; the authors of that method didn't take into account the effects of a distributed filesystem and only wrote that method to work on plain old local files.

Changing the current working directory in Java?

How can I change the current working directory from within a Java program? Everything I've been able to find about the issue claims that you simply can't do it, but I can't believe that that's really the case.
I have a piece of code that opens a file using a hard-coded relative file path from the directory it's normally started in, and I just want to be able to use that code from within a different Java program without having to start it from within a particular directory. It seems like you should just be able to call System.setProperty( "user.dir", "/path/to/dir" ), but as far as I can figure out, calling that line just silently fails and does nothing.
I would understand if Java didn't allow you to do this, if it weren't for the fact that it allows you to get the current working directory, and even allows you to open files using relative file paths....

There is no reliable way to do this in pure Java. Setting the user.dir property via System.setProperty() or java -Duser.dir=... does seem to affect subsequent creations of Files, but not e.g. FileOutputStreams.
The File(String parent, String child) constructor can help if you build up your directory path separately from your file path, allowing easier swapping.
An alternative is to set up a script to run Java from a different directory, or use JNI native code as suggested below.
The relevant OpenJDK bug was closed in 2008 as "will not fix".

If you run your legacy program with ProcessBuilder, you will be able to specify its working directory.

There is a way to do this using the system property "user.dir". The key part to understand is that getAbsoluteFile() must be called (as shown below) or else relative paths will be resolved against the default "user.dir" value.
import java.io.*;
public class FileUtils
{
public static boolean setCurrentDirectory(String directory_name)
{
boolean result = false; // Boolean indicating whether directory was set
File directory; // Desired current working directory
directory = new File(directory_name).getAbsoluteFile();
if (directory.exists() || directory.mkdirs())
{
result = (System.setProperty("user.dir", directory.getAbsolutePath()) != null);
}
return result;
}
public static PrintWriter openOutputFile(String file_name)
{
PrintWriter output = null; // File to open for writing
try
{
output = new PrintWriter(new File(file_name).getAbsoluteFile());
}
catch (Exception exception) {}
return output;
}
public static void main(String[] args) throws Exception
{
FileUtils.openOutputFile("DefaultDirectoryFile.txt");
FileUtils.setCurrentDirectory("NewCurrentDirectory");
FileUtils.openOutputFile("CurrentDirectoryFile.txt");
}
}

It is possible to change the PWD, using JNA/JNI to make calls to libc. The JRuby guys have a handy java library for making POSIX calls called jnr-posix. Here's the maven info

As mentioned you can't change the CWD of the JVM but if you were to launch another process using Runtime.exec() you can use the overloaded method that lets you specify the working directory. This is not really for running your Java program in another directory but for many cases when one needs to launch another program like a Perl script for example, you can specify the working directory of that script while leaving the working dir of the JVM unchanged.
See Runtime.exec javadocs
Specifically,
public Process exec(String[] cmdarray,String[] envp, File dir) throws IOException
where dir is the working directory to run the subprocess in

If I understand correctly, a Java program starts with a copy of the current environment variables. Any changes via System.setProperty(String, String) are modifying the copy, not the original environment variables. Not that this provides a thorough reason as to why Sun chose this behavior, but perhaps it sheds a little light...

The working directory is a operating system feature (set when the process starts).
Why don't you just pass your own System property (-Dsomeprop=/my/path) and use that in your code as the parent of your File:
File f = new File ( System.getProperty("someprop"), myFilename)

The smarter/easier thing to do here is to just change your code so that instead of opening the file assuming that it exists in the current working directory (I assume you are doing something like new File("blah.txt"), just build the path to the file yourself.
Let the user pass in the base directory, read it from a config file, fall back to user.dir if the other properties can't be found, etc. But it's a whole lot easier to improve the logic in your program than it is to change how environment variables work.

I have tried to invoke
String oldDir = System.setProperty("user.dir", currdir.getAbsolutePath());
It seems to work. But
File myFile = new File("localpath.ext");
InputStream openit = new FileInputStream(myFile);
throws a FileNotFoundException though
myFile.getAbsolutePath()
shows the correct path.
I have read this. I think the problem is:
Java knows the current directory with the new setting.
But the file handling is done by the operation system. It does not know the new set current directory, unfortunately.
The solution may be:
File myFile = new File(System.getPropety("user.dir"), "localpath.ext");
It creates a file Object as absolute one with the current directory which is known by the JVM. But that code should be existing in a used class, it needs changing of reused codes.
~~~~JcHartmut

You can use
new File("relative/path").getAbsoluteFile()
after
System.setProperty("user.dir", "/some/directory")
System.setProperty("user.dir", "C:/OtherProject");
File file = new File("data/data.csv").getAbsoluteFile();
System.out.println(file.getPath());
Will print
C:\OtherProject\data\data.csv

You can change the process's actual working directory using JNI or JNA.
With JNI, you can use native functions to set the directory. The POSIX method is chdir(). On Windows, you can use SetCurrentDirectory().
With JNA, you can wrap the native functions in Java binders.
For Windows:
private static interface MyKernel32 extends Library {
public MyKernel32 INSTANCE = (MyKernel32) Native.loadLibrary("Kernel32", MyKernel32.class);
/** BOOL SetCurrentDirectory( LPCTSTR lpPathName ); */
int SetCurrentDirectoryW(char[] pathName);
}
For POSIX systems:
private interface MyCLibrary extends Library {
MyCLibrary INSTANCE = (MyCLibrary) Native.loadLibrary("c", MyCLibrary.class);
/** int chdir(const char *path); */
int chdir( String path );
}

The other possible answer to this question may depend on the reason you are opening the file. Is this a property file or a file that has some configuration related to your application?
If this is the case you may consider trying to load the file through the classpath loader, this way you can load any file Java has access to.

If you run your commands in a shell you can write something like "java -cp" and add any directories you want separated by ":" if java doesnt find something in one directory it will go try and find them in the other directories, that is what I do.

Use FileSystemView
private FileSystemView fileSystemView;
fileSystemView = FileSystemView.getFileSystemView();
currentDirectory = new File(".");
//listing currentDirectory
File[] filesAndDirs = fileSystemView.getFiles(currentDirectory, false);
fileList = new ArrayList<File>();
dirList = new ArrayList<File>();
for (File file : filesAndDirs) {
if (file.isDirectory())
dirList.add(file);
else
fileList.add(file);
}
Collections.sort(dirList);
if (!fileSystemView.isFileSystemRoot(currentDirectory))
dirList.add(0, new File(".."));
Collections.sort(fileList);
//change
currentDirectory = fileSystemView.getParentDirectory(currentDirectory);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Uniquely identify file in Java - java

I would suggest the GIT method of hashing the file contents. This is proof against copying and renaming. Java is supposed to be platform independent so using Unix specific methods may not be what you want.

Related

TrueZip - How to get Size of a Folder within an Zip Archive

How to use Files.getFileStore() for substed drive (on Windows)?

How to efficiently test if files with a matching filename (regex or wildcard) exists in a directory?

How to convert a Hadoop Path object into a Java File object

Changing the current working directory in Java?

Categories

Resources