File.createNewFile() randomly fails

File.createNewFile() randomly fails - java

I've build a simple test which creates and deletes a file (name does not change) in an infinite loop. The test does run for a couple of seconds (sometimes over 77,000 iterations!) and then fails with this exception:
Exception in thread "main" java.io.IOException: Access is denied
at java.io.WinNTFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(Unknown Source)
at DeleteTest.main(DeleteTest.java:11)
Here's the test logic:
final File f = new File(pathname);
while (true) {
final boolean create = f.createNewFile();
if (!create) {
System.out.println("crate failed");
} else {
final boolean delete = f.delete();
if (!delete) {
System.out.println("delete failed");
}
}
}
How is this possible? The delete call does not fail. It would tell. So delete always succeeds but createNewFile fails. This is what MSDN says about win32 api function DeleteFile:
The DeleteFile function marks a file for deletion on close. Therefore,
the file deletion does not occur until the last handle to the file is
closed. Subsequent calls to CreateFile to open the file fail with
ERROR_ACCESS_DENIED.
So createNewFile does not close the file? The openjdk source tells us that the file is closed:
JNIEXPORT jboolean JNICALL
Java_java_io_Win32FileSystem_createFileExclusively(JNIEnv *env, jclass cls,
jstring pathname)
{
jboolean rv = JNI_FALSE;
DWORD a;
WITH_PLATFORM_STRING(env, pathname, path) {
int orv;
int error;
JVM_NativePath((char *)path);
orv = JVM_Open(path, JVM_O_RDWR | JVM_O_CREAT | JVM_O_EXCL, 0666);
if (orv < 0) {
if (orv != JVM_EEXIST) {
error = GetLastError();
// If a directory by the named path already exists,
// return false (behavior of solaris and linux) instead of
// throwing an exception
a = GetFileAttributes(path);
if ((a == INVALID_FILE_ATTRIBUTES) ||
!(a & FILE_ATTRIBUTE_DIRECTORY)) {
SetLastError(error);
JNU_ThrowIOExceptionWithLastError(env, path);
}
}
} else {
JVM_Close(orv);
rv = JNI_TRUE;
}
} END_PLATFORM_STRING(env, path);
return rv;
}
Can anyone explain this behaviour?

I've found an explanation while writing the question. I still posted the question because I wanted to share what I learned.
My application is not the only process on the system accessing files. The Windows Search Index Service for example could open this file because it wants to add it to it's index. Or the windows Explorer if it is updating the view.

This issue reminds me a problem I experienced recently with the File.renameTo() method. It is (was?) due to this bug in the jvm :
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6213298
A weird workaround is to call System.gc() and to retry renaming the file again (and it works...).
Not sure it has a link with your issue, but it may be worth exploring...

Try this:
final File f = new File("file");
while (true) {
final boolean create = f.createNewFile();
if (!create) {
System.out.println("crate failed");
} else {
final boolean delete = f.delete();
try {
Thread.sleep(10);
} catch (InterruptedException e) {
System.out.println("...");
}
if (!delete) {
System.out.println("delete failed");
}
}
}
In this way we ensure that the file is released by the delete before invoking createNewFile.

Related

FileChannel.open(path, CREATE|CREATE_NEW) without WRITE option throws NoSuchFileException

I had the following code:
#Nonnull
#SneakyThrows
private Pair<InputStream, Long> probeSize(#Nonnull final InputStream image) {
final String tmpId = UUID.randomUUID().toString();
final File probeFile = new File(tmpDir, tmpId + ".jpg");
try (final FileChannel outChannel = FileChannel.open(probeFile.toPath(), CREATE);
final ReadableByteChannel innChannel = Channels.newChannel(image)) {
outChannel.transferFrom(innChannel, 0, Long.MAX_VALUE);
}
final Long fileSize = probeFile.length();
return Pair.of(new FileInputStream(probeFile), fileSize);
}
This code consistently threw the following exception:
Caused by: java.nio.file.NoSuchFileException: /tmp/4bbc9008-e91c-4f18-b0f2-c61eed35066e.jpg
at sun.nio.fs.UnixException.translateToIOException(Unknown Source)
at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(Unknown Source)
looking at the javadoc of FileChannel.open(path, option) and the associated StandardOpenOption, there is no documentation that alludes to the fact that, to create a file, you must also open it for write.
The only options that work:
FileChannel.open(probeFile.toPath(), CREATE, WRITE)
FileChannel.open(probeFile.toPath(), CREATE_NEW, WRITE)
I only determined this by going through the UnixChannelFactory.newFileChannel and noticed the following:
UnixChannelFactory:
protected static FileDescriptor open(int dfd,
UnixPath path,
String pathForPermissionCheck,
Flags flags,
int mode)
throws UnixException
{
// map to oflags
int oflags;
if (flags.read && flags.write) {
oflags = O_RDWR;
} else {
oflags = (flags.write) ? O_WRONLY : O_RDONLY;
}
if (flags.write) {
if (flags.truncateExisting)
oflags |= O_TRUNC;
if (flags.append)
oflags |= O_APPEND;
// create flags
if (flags.createNew) {
byte[] pathForSysCall = path.asByteArray();
// throw exception if file name is "." to avoid confusing error
if ((pathForSysCall[pathForSysCall.length-1] == '.') &&
(pathForSysCall.length == 1 ||
(pathForSysCall[pathForSysCall.length-2] == '/')))
{
throw new UnixException(EEXIST);
}
oflags |= (O_CREAT | O_EXCL);
} else {
if (flags.create)
oflags |= O_CREAT;
}
}
Which shows that, unless you specify WRITE option, the file will never be created.
Is this a bug or an intended functionality, that FileChannel.open cannot create a file unless it is opened for write?

I'm looking at the JDK 7 Javadoc for FileChannel.open(...).
The doc for the method says:
The READ and WRITE options determine if the file should be opened for reading and/or writing. If neither option (or the APPEND option) is contained in the array then the file is opened for reading.
The doc for CREATE_NEW says:
This option is ignored when the file is opened only for reading.
The doc for CREATE says:
This option is ignored if the CREATE_NEW option is also present or the file is opened only for reading.
Putting these three snippets together, yes, this is expected behavior.

FileNotFound exception even though file is in the place during watch service in java

I have a watch service running on a folder, when I am trying to modify and existing file using evenKind == Modify (basically pasting a same file without removing the current file) I am getting FileNotFoundException (The process cannot access the file because it is being used by another process.)
if (eventKind == StandardWatchEventKinds.ENTRY_MODIFY) {
String newFileChecksum = null;
if (eventPath.toFile().exists()) {
newFileChecksum = getFileChecksum(eventPath.toFile());
}
if (fileMapper.containsKey(eventPath)) {
String existingFileChecksum = fileMapper.get(eventPath);
if (!existingFileChecksum.equals(newFileChecksum)) {
fileMapper.replace(eventPath, existingFileChecksum, newFileChecksum);
log.info("listener.filemodified IN");
for (DirectoryListener listener : this.listeners) {
listener.fileModified(this, eventPath);
}
log.info("listener.filemodified OUT");
} else {
log.info("existing checksum");
log.debug(String.format(
"Checksum for file [%s] has not changed. Skipping plugin processing.",
eventPath.getFileName()));
}
}
}
In the code when...getFileChecksum() is called
if (eventPath.toFile().exists()) {
newFileChecksum = getFileChecksum(eventPath.toFile());
}
So ideally, eventPath.toFile().exists() is TRUE, hence code is going inside if but when getFileChecksum() is called, it goes to method...
private synchronized String getFileChecksum(File file) throws IOException, NoSuchAlgorithmException {
MessageDigest md5Digest = MessageDigest.getInstance("MD5");
FileInputStream fis = null;
if(file.exists()) {
try {
fis = new FileInputStream(file);
} catch(Exception e) {
e.printStackTrace();
}
} else {
log.warn("File not detected.");
}
byte[] byteArray = new byte[1024];
int bytesCount = 0;
while ((bytesCount = fis.read(byteArray)) != -1) {
md5Digest.update(byteArray, 0, bytesCount);
};
fis.close();
byte[] bytes = md5Digest.digest();
StringBuilder stringBuilder = new StringBuilder();
for (int i=0; i< bytes.length ;i++) {
stringBuilder.append(Integer.toString((bytes[i] & 0xff) + 0x100, 16).substring(1));
}
return stringBuilder.toString();
}
}
An exception is coming fis = new FileInputStream(file); even if the file is present in the folder.
FileNotFoundException (The process cannot access the file because it is being used by another process.)
I created a RandomAccessFile and a channel to release any LOCK placed on file, but it is not working. Please suggest what could be happening here.
//UPDATE --> This is the infinite while loop that I have,
WHAT IS HAPPENING? WHEN I PUT A FILE 1 create and 2 update are getting called, suppose, when I am deleting the file, 1 delete 1 modify is being called, and IF I PUT THE SAME FILE BACK TO FOLDER, I GET CREATE but before CREATE is finishing, MODIFY IS BEING called. and create is not running instead modify is running.
I fixed this issue by putting Thread.sleep(500) between
WatchKey wk = watchService.take();
Thread.sleep(500)
for (WatchEvent<?> event : wk.pollEvents()) {
But I dont think I can justify use of sleep here. Please help
WatchService watchService = null;
WatchKey watchKey = null;
while (!this.canceled && (watchKey == null)) {
watchService = watchService == null
? FileSystems.getDefault().newWatchService() : watchService;
watchKey = this.directory.register(watchService,
StandardWatchEventKinds.ENTRY_MODIFY, StandardWatchEventKinds.ENTRY_DELETE,
StandardWatchEventKinds.ENTRY_CREATE);
}
while (!this.canceled) {
try {
WatchKey wk = watchService.take();
for (WatchEvent<?> event : wk.pollEvents()) {
Kind<?> eventKind = event.kind();
System.out.println("Event kind : " + eventKind);
Path dir = (Path)wk.watchable();
Path eventPath = (Path) event.context();
Path fullPath = dir.resolve(eventPath);
fireEvent(eventKind, fullPath);
}
wk.reset();
}

I have a better approach, use and a while loop on a var isFileReady like this...
var isFileReady = false;
while(!isFile...) {
}
inside while create a try and catch.
try {
FileInputStream fis = new FileInputStream();
isFileReady = true;
} catch () {
catch exception or print file not ready.
}
This will solve your problem.

The WatchService is verbose and may report multiple ENTRY_MODIFY events for save operation - even when another application is part way through or doing writes repeatedly. Your code is probably acting on a modify event while the other app is still writing and there may be a second ENTRY_MODIFY on its way.
A safer strategy for using the WatchService is to collate the events you receive and only act on the changes when there is a pause. Something like this will ensure that you block on first event but then poll the watch service with small timeout to see if more changes are present before you act on the previous set:
WatchService ws = ...
HashSet<Path> modified = new HashSet<>();
while(appIsRunning) {
int countNow = modified.size();
WatchKey k = countNow == 0 ? ws.take() : ws.poll(1, TimeUnit.MILLISECONDS);
if (k != null) {
// Loop through k.pollEvents() and put modify file path into modified set:
// DO NOT CALL fireEvent HERE, save the path instead:
...
if (eventKind == ENTRY_MODIFY)
modified.add(filePath);
}
// Don't act on changes unless no new events:
if (countNow == modified.size()) {
// ACT ON modified list here - the watch service did not report new changes
for (Path filePath : modified) {
// call fireEvent HERE:
fireEvent(filePath);
}
// reset the list so next watch call is take() not poll(1)
modified.clear();
}
}
If you are also looking out for CREATE and DELETE operations with MODIFY you will have to collate and ignore some of the earlier events because the last recorded event type can take precedence over a previously recorded type. For example, if calling take() then poll(1) until nothing new is reported:
Any DELETE then CREATE => you might want to consider as MODIFY
Any CREATE then MODIFY => you might want to consider as CREATE
Any CREATE or MODIFY then a DELETE => treat as DELETE
Your logic would also want to only act when value of modified.size() + created.size() + deleted.size() gets changed between runs.

let me guess...
modify event gets called when you modify a file. to modify the file you most likely use a seperate tool like notepad that opens and LOCKS the file.
your watcher gets an event that the file gets modified (right now) but you can not modify it again (which fileinputstream wants to do) since it is locked already.

Cannot find a image file that exists in java

I have written a function which takes in a BufferedImage and compares it to a pre-existing image in my hard drive checking if they are same or not.
public boolean checkIfSimilarImages(BufferedImage imgA, File B) {
DataBuffer imgAdata = imgA.getData().getDataBuffer();
int sizeA = imgAdata.getSize();
BufferedImage imgB = null;
try {
imgB = ImageIO.read(B);
} catch (IOException ex) {
Logger.getLogger(SupportClass.class.getName()).log(Level.SEVERE, null, ex);
}
DataBuffer imgBdata = imgB.getData().getDataBuffer();
int sizeB = imgBdata.getSize();
if(sizeA == sizeB) {
for(int i = 0; i < sizeA; i++) {
if (imgAdata.getElem(i) != imgBdata.getElem(i)) {
return false;
}
}
}
return true;
}
This throws IOException "Cant read input file". Idk why this is happening. I am calling the function like this...
while(support.checkIfSimilarImages(currentDisplay, new File(pathToOriginalImage)) == false) {
System.out.println("Executing while-loop!");
bot.delay(3000);
currentDisplay = bot.createScreenCapture(captureArea);
}
where,
String pathToOriginalImage = "‪‪‪‪C:\\Users\\Chandrachur\\Desktop\\Home.jpg";
I can see that the path is valid. But as I am testing it for File.exists() or File.canRead() or File.absoluteFile().exists() inside the checkIfSimilarImages function and everything is returning false.
I have researched my question here and tried out these suggestions:
It is not only for this location, I have tried a variety of other locations but in vain. Also it is not a problem where I have hidden file extensions and the actual file might be Home.jpg.jpg .
The only thing that might be is that permissions might be different. I dont really know how to verify this, but there is no reason it should have some permission which is not readable by java. It is just another normal jpg file.
Can it be because I am passing the file object reference into a function so in this process somehow the reference is getting modified or something. I just dont know. I am running out of possibilities to test for...
The whole stack trace is as follows:
javax.imageio.IIOException: Can't read input file!
at javax.imageio.ImageIO.read(ImageIO.java:1301)
at battlesbot.SupportClass.checkIfSimilarImages(SupportClass.java:77)
at battlesbot.AutomatedActions.reachHomeScreen(AutomatedActions.java:72)
at battlesbot.BattlesBot.main(BattlesBot.java:22)
Exception in thread "main" java.lang.NullPointerException
at battlesbot.SupportClass.checkIfSimilarImages(SupportClass.java:81)
at battlesbot.AutomatedActions.reachHomeScreen(AutomatedActions.java:72)
at battlesbot.BattlesBot.main(BattlesBot.java:22)
C:\Users\Chandrachur\AppData\Local\NetBeans\Cache\8.2\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 11 seconds)
I am on Windows 10, IDE is NetBeans.
UPDATE:
Huge thanks to #k5_ . He told me to paste this in path and it worked.
"C:/Users/Chandrachur/Desktop/Home.jpg";
It seems some invisible characters were in the path. But I still don't understand what that means.

Usually this kind of problem lies with access problem or typos in the filename.
In this case there were some invisible unicode characters x202A in the filename. The windows dialog box, the file path was copied from, uses them for direction of writing (left to right).
One way of displaying them would be this loop, it has 4 invisible characters at the start of the String. You would also see them in a debugger.
String x = "‪‪‪‪C:\\Users\\Chandrachur\\Desktop\\Home.jpg";
for(char c : x.toCharArray()) {
System.out.println( c + " " + (int) c);
}

Can't co-create object / Can't find moniker | Jacob

When creating an ActiveXComponent using JACOB I get the following error.
com.jacob.com.ComFailException: Can't co-create object
at com.jacob.com.Dispatch.createInstanceNative(Native Method)
at com.jacob.com.Dispatch.<init>(Dispatch.java:99)
at com.jacob.activeX.ActiveXComponent.<init>(ActiveXComponent.java:58)
at com.paston.jacobtest.RidderIQ.main(RidderIQ.java:30)
The COM object which I need to use from a program which doesn't register its DLLs by itself during installation.
To register the DLL I used the 64bit version of RegAsm according to this article that could help. Also, I tried to load every DLL in of the external program because I suspected that there could be "something" wrong with loading the dependencies.
Here is my current code:
public static void main(String[] args) {
String dllDir = "C:\\Program Files (x86)\\Ridder iQ Client\\Bin\\";
File folder = new File( dllDir );
for (final File fileEntry : folder.listFiles()) {
String str = fileEntry.getName();
if (str.substring(str.lastIndexOf('.') + 1).equals("dll")) {
System.out.println(fileEntry.getName());
System.load(dllDir + str);
}
}
try {
ActiveXComponent example = new ActiveXComponent("RidderIQSDK");
} catch (Exception e) {
System.out.println(e.getMessage());
e.printStackTrace();
}
}
When changing the name to the clsid I get a different exception.
com.jacob.com.ComFailException: Can't find moniker
at com.jacob.com.Dispatch.createInstanceNative(Native Method)
at com.jacob.com.Dispatch.<init>(Dispatch.java:99)
at com.jacob.activeX.ActiveXComponent.<init>(ActiveXComponent.java:58)
at com.paston.jacobtest.RidderIQ.main(RidderIQ.java:28)
I got JACOB to work with my code in another test using the system's Random object.
ActiveXComponent random = new ActiveXComponent("clsid:4E77EC8F-51D8-386C-85FE-7DC931B7A8E7");
Object obj = random.getObject();
Object result = Dispatch.call((Dispatch) obj, "Next");
System.out.println("Result: "+result);

I tried all solution and finally succeeded to crack the code related to JACOB. Create your code as per below sample code.
public static void main(String[] args) {
String libFile = System.getProperty("os.arch").equals("amd64") ? "jacob-1.17-x64.dll" :"jacob-1.17-x86.dll";
try{
/**
* Reading jacob.dll file
*/
InputStream inputStream = certificatemain.class.getResourceAsStream(libFile);
/**
* Step 1: Create temporary file under <%user.home%>\AppData\Local\Temp\jacob.dll
* Step 2: Write contents of `inputStream` to that temporary file.
*/
File temporaryDll = File.createTempFile("jacob", ".dll");
FileOutputStream outputStream = new FileOutputStream(temporaryDll);
byte[] array = new byte[8192];
for (int i = inputStream.read(array); i != -1; i = inputStream.read(array)){
outputStream.write(array, 0, i);
}
outputStream.close();
/* Temporary file will be removed after terminating-closing-ending the application-program */
System.setProperty(LibraryLoader.JACOB_DLL_PATH, temporaryDll.getAbsolutePath());
LibraryLoader.loadJacobLibrary();
ActiveXComponent comp=new ActiveXComponent("Com.Calculation");
System.out.println("The Library been loaded, and an activeX component been created");
int arg1=100;
int arg2=50;
//using the functions from the library:
int summation=Dispatch.call(comp, "sum",arg1,arg2).toInt();
System.out.println("Summation= "+ summation);
}catch(Exception e){
e.printStackTrace();
}
}
Now let me tell you how to register your DLL. I also followed same article you mentioned but not working when you are dealing with applet.
Go to x86 framework using command line.
C:\Windows\Microsoft.NET\Framework\v2.0.50727
to register do same as
regasm.exe path_to_your_dll.dll /codebase
Don't pass any other flag except /codebase. You are done with it... Still you find any problem let me know...

Is there a workaround for Java's poor performance on walking huge directories?

I am trying to process files one at a time that are stored over a network. Reading the files is fast due to buffering is not the issue. The problem I have is just listing the directories in a folder. I have at least 10k files per folder over many folders.
Performance is super slow since File.list() returns an array instead of an iterable. Java goes off and collects all the names in a folder and packs it into an array before returning.
The bug entry for this is http://bugs.sun.com/view_bug.do;jsessionid=db7fcf25bcce13541c4289edeb4?bug_id=4285834 and doesn't have a work around. They just say this has been fixed for JDK7.
A few questions:
Does anybody have a workaround to this performance bottleneck?
Am I trying to achieve the impossible? Is performance still going to be poor even if it just iterates over the directories?
Could I use the beta JDK7 builds that have this functionality without having to build my entire project on it?

Although it's not pretty, I solved this kind of problem once by piping the output of dir/ls to a file before starting my app, and passing in the filename.
If you needed to do it within the app, you could just use system.exec(), but it would create some nastiness.
You asked. The first form is going to be blazingly fast, the second should be pretty fast as well.
Be sure to do the one item per line (bare, no decoration, no graphics), full path and recurse options of your selected command.
EDIT:
30 minutes just to get a directory listing, wow.
It just struck me that if you use exec(), you can get it's stdout redirected into a pipe instead of writing it to a file.
If you did that, you should start getting the files immediately and be able to begin processing before the command has completed.
The interaction may actually slow things down, but maybe not--you might give it a try.
Wow, I just went to find the syntax of the .exec command for you and came across this, possibly exactly what you want (it lists a directory using exec and "ls" and pipes the result into your program for processing): good link in wayback (Jörg provided in a comment to replace this one from sun that Oracle broke)
Anyway, the idea is straightforward but getting the code right is annoying. I'll go steal some codes from the internets and hack them up--brb
/**
* Note: Only use this as a last resort! It's specific to windows and even
* at that it's not a good solution, but it should be fast.
*
* to use it, extend FileProcessor and call processFiles("...") with a list
* of options if you want them like /s... I highly recommend /b
*
* override processFile and it will be called once for each line of output.
*/
import java.io.*;
public abstract class FileProcessor
{
public void processFiles(String dirOptions)
{
Process theProcess = null;
BufferedReader inStream = null;
// call the Hello class
try
{
theProcess = Runtime.getRuntime().exec("cmd /c dir " + dirOptions);
}
catch(IOException e)
{
System.err.println("Error on exec() method");
e.printStackTrace();
}
// read from the called program's standard output stream
try
{
inStream = new BufferedReader(
new InputStreamReader( theProcess.getInputStream() ));
processFile(inStream.readLine());
}
catch(IOException e)
{
System.err.println("Error on inStream.readLine()");
e.printStackTrace();
}
} // end method
/** Override this method--it will be called once for each file */
public abstract void processFile(String filename);
} // end class
And thank you code donor at IBM

How about using File.list(FilenameFilter filter) method and implementing FilenameFilter.accept(File dir, String name) to process each file and return false.
I ran this on Linux vm for directory with 10K+ files and it took <10 seconds.
import java.io.File;
import java.io.FilenameFilter;
public class Temp {
private static void processFile(File dir, String name) {
File file = new File(dir, name);
System.out.println("processing file " + file.getName());
}
private static void forEachFile(File dir) {
String [] ignore = dir.list(new FilenameFilter() {
public boolean accept(File dir, String name) {
processFile(dir, name);
return false;
}
});
}
public static void main(String[] args) {
long before, after;
File dot = new File(".");
before = System.currentTimeMillis();
forEachFile(dot);
after = System.currentTimeMillis();
System.out.println("after call, delta is " + (after - before));
}
}

An alternative is to have the files served over a different protocol. As I understand you're using SMB for that and java is just trying to list them as a regular file.
The problem here might not be java alone ( how does it behaves when you open that directory with Microsoft Explorer x:\shared ) In my experience it also take a considerably amount of time.
You can change the protocol to something like HTTP, only to fetch the file names. This way you can retrieve the list of files over http ( 10k lines should't be too much ) and let the server deal with file listing. This would be very fast, since it will run with local resources ( those in the server )
Then when you have the list, you can process them one by exactly the way you're doing right now.
The keypoint is to have an aid mechanism in the other side of the node.
Is this feasible?
Today:
File [] content = new File("X:\\remote\\dir").listFiles();
for ( File f : content ) {
process( f );
}
Proposed:
String [] content = fetchViaHttpTheListNameOf("x:\\remote\\dir");
for ( String fileName : content ) {
process( new File( fileName ) );
}
The http server could be a very small small and simple file.
If this is the way you have it right now, what you're doing is to fetch all the 10k files information to your client machine ( I don't know how much of that info ) when you only need the file name for later processing.
If the processing is very fast right now it may be slowed down a bit. This is because the information prefetched is no longer available.
Give it a try.

A non-portable solution would be to make native calls to the operating system and stream the results.
For Linux
You can look at something like readdir. You can walk the directory structure like a linked list and return results in batches or individually.
For Windows
In windows the behavior would be fairly similar using FindFirstFile and FindNextFile apis.

I doubt the problem is relate to the bug report you referenced.
The issue there is "only" memory usage, but not necessarily speed.
If you have enough memory the bug is not relevant for your problem.
You should measure whether your problem is memory related or not. Turn on your Garbage Collector log and use for example gcviewer to analyze your memory usage.
I suspect that it has to do with the SMB protocol causing the problem.
You can try to write a test in another language and see if it's faster, or you can try to get the list of filenames through some other method, such as described here in another post.

If you need to eventually process all files, then having Iterable over String[] won't give you any advantage, as you'll still have to go and fetch the whole list of files.

If you're on Java 1.5 or 1.6, shelling out "dir" commands and parsing the standard output stream on Windows is a perfectly acceptable approach. I've used this approach in the past for processing network drives and it has generally been a lot faster than waiting for the native java.io.File listFiles() method to return.
Of course, a JNI call should be faster and potentially safer than shelling out "dir" commands. The following JNI code can be used to retrieve a list of files/directories using the Windows API. This function can be easily refactored into a new class so the caller can retrieve file paths incrementally (i.e. get one path at a time). For example, you can refactor the code so that FindFirstFileW is called in a constructor and have a seperate method to call FindNextFileW.
JNIEXPORT jstring JNICALL Java_javaxt_io_File_GetFiles(JNIEnv *env, jclass, jstring directory)
{
HANDLE hFind;
try {
//Convert jstring to wstring
const jchar *_directory = env->GetStringChars(directory, 0);
jsize x = env->GetStringLength(directory);
wstring path; //L"C:\\temp\\*";
path.assign(_directory, _directory + x);
env->ReleaseStringChars(directory, _directory);
if (x<2){
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "Invalid path, less than 2 characters long.");
}
wstringstream ss;
BOOL bContinue = TRUE;
WIN32_FIND_DATAW data;
hFind = FindFirstFileW(path.c_str(), &data);
if (INVALID_HANDLE_VALUE == hFind){
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "FindFirstFileW returned invalid handle.");
}
//HANDLE hStdOut = GetStdHandle(STD_OUTPUT_HANDLE);
//DWORD dwBytesWritten;
// If we have no error, loop thru the files in this dir
while (hFind && bContinue){
/*
//Debug Print Statment. DO NOT DELETE! cout and wcout do not print unicode correctly.
WriteConsole(hStdOut, data.cFileName, (DWORD)_tcslen(data.cFileName), &dwBytesWritten, NULL);
WriteConsole(hStdOut, L"\n", 1, &dwBytesWritten, NULL);
*/
//Check if this entry is a directory
if (data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY){
// Make sure this dir is not . or ..
if (wstring(data.cFileName) != L"." &&
wstring(data.cFileName) != L"..")
{
ss << wstring(data.cFileName) << L"\\" << L"\n";
}
}
else{
ss << wstring(data.cFileName) << L"\n";
}
bContinue = FindNextFileW(hFind, &data);
}
FindClose(hFind); // Free the dir structure
wstring cstr = ss.str();
int len = cstr.size();
//WriteConsole(hStdOut, cstr.c_str(), len, &dwBytesWritten, NULL);
//WriteConsole(hStdOut, L"\n", 1, &dwBytesWritten, NULL);
jchar* raw = new jchar[len];
memcpy(raw, cstr.c_str(), len*sizeof(wchar_t));
jstring result = env->NewString(raw, len);
delete[] raw;
return result;
}
catch(...){
FindClose(hFind);
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "Exception occured.");
}
return NULL;
}
Credit:
https://sites.google.com/site/jozsefbekes/Home/windows-programming/miscellaneous-functions
Even with this approach, there are still efficiencies to be gained. If you serialize the path to a java.io.File, there is a huge performance hit - especially if the path represents a file on a network drive. I have no idea what Sun/Oracle is doing under the hood but if you need additional file attributes other than the file path (e.g. size, mod date, etc), I have found that the following JNI function is much faster than instantiating a java.io.File object on a network the path.
JNIEXPORT jlongArray JNICALL Java_javaxt_io_File_GetFileAttributesEx(JNIEnv *env, jclass, jstring filename)
{
//Convert jstring to wstring
const jchar *_filename = env->GetStringChars(filename, 0);
jsize len = env->GetStringLength(filename);
wstring path;
path.assign(_filename, _filename + len);
env->ReleaseStringChars(filename, _filename);
//Get attributes
WIN32_FILE_ATTRIBUTE_DATA fileAttrs;
BOOL result = GetFileAttributesExW(path.c_str(), GetFileExInfoStandard, &fileAttrs);
if (!result) {
jclass exceptionClass = env->FindClass("java/lang/Exception");
env->ThrowNew(exceptionClass, "Exception Occurred");
}
//Create an array to store the WIN32_FILE_ATTRIBUTE_DATA
jlong buffer[6];
buffer[0] = fileAttrs.dwFileAttributes;
buffer[1] = date2int(fileAttrs.ftCreationTime);
buffer[2] = date2int(fileAttrs.ftLastAccessTime);
buffer[3] = date2int(fileAttrs.ftLastWriteTime);
buffer[4] = fileAttrs.nFileSizeHigh;
buffer[5] = fileAttrs.nFileSizeLow;
jlongArray jLongArray = env->NewLongArray(6);
env->SetLongArrayRegion(jLongArray, 0, 6, buffer);
return jLongArray;
}
You can find a full working example of this JNI-based approach in the javaxt-core library. In my tests using Java 1.6.0_38 with a Windows host hitting a Windows share, I have found this JNI approach approximately 10x faster then calling java.io.File listFiles() or shelling out "dir" commands.

I wonder why there are 10k files in a directory. Some file systems do not work well with so many files. There are specifics limitations for file systems like max amount of files per directory and max amount of levels of subdirectory.
I solve a similar problem with an iterator solution.
I needed to walk across huge directorys and several levels of directory tree recursively.
I try FileUtils.iterateFiles() of Apache commons io. But it implement the iterator by adding all the files in a List and then returning List.iterator(). It's very bad for memory.
So I prefer to write something like this:
private static class SequentialIterator implements Iterator<File> {
private DirectoryStack dir = null;
private File current = null;
private long limit;
private FileFilter filter = null;
public SequentialIterator(String path, long limit, FileFilter ff) {
current = new File(path);
this.limit = limit;
filter = ff;
dir = DirectoryStack.getNewStack(current);
}
public boolean hasNext() {
while(walkOver());
return isMore && (limit > count || limit < 0) && dir.getCurrent() != null;
}
private long count = 0;
public File next() {
File aux = dir.getCurrent();
dir.advancePostition();
count++;
return aux;
}
private boolean walkOver() {
if (dir.isOutOfDirListRange()) {
if (dir.isCantGoParent()) {
isMore = false;
return false;
} else {
dir.goToParent();
dir.advancePostition();
return true;
}
} else {
if (dir.isCurrentDirectory()) {
if (dir.isDirectoryEmpty()) {
dir.advancePostition();
} else {
dir.goIntoDir();
}
return true;
} else {
if (filter.accept(dir.getCurrent())) {
return false;
} else {
dir.advancePostition();
return true;
}
}
}
}
private boolean isMore = true;
public void remove() {
throw new UnsupportedOperationException();
}
}
Note that the iterator stop by an amount of files iterateds and it has a FileFilter also.
And DirectoryStack is:
public class DirectoryStack {
private class Element{
private File files[] = null;
private int currentPointer;
public Element(File current) {
currentPointer = 0;
if (current.exists()) {
if(current.isDirectory()){
files = current.listFiles();
Set<File> set = new TreeSet<File>();
for (int i = 0; i < files.length; i++) {
File file = files[i];
set.add(file);
}
set.toArray(files);
}else{
throw new IllegalArgumentException("File current must be directory");
}
} else {
throw new IllegalArgumentException("File current not exist");
}
}
public String toString(){
return "current="+getCurrent().toString();
}
public int getCurrentPointer() {
return currentPointer;
}
public void setCurrentPointer(int currentPointer) {
this.currentPointer = currentPointer;
}
public File[] getFiles() {
return files;
}
public File getCurrent(){
File ret = null;
try{
ret = getFiles()[getCurrentPointer()];
}catch (Exception e){
}
return ret;
}
public boolean isDirectoryEmpty(){
return !(getFiles().length>0);
}
public Element advancePointer(){
setCurrentPointer(getCurrentPointer()+1);
return this;
}
}
private DirectoryStack(File first){
getStack().push(new Element(first));
}
public static DirectoryStack getNewStack(File first){
return new DirectoryStack(first);
}
public String toString(){
String ret = "stack:\n";
int i = 0;
for (Element elem : stack) {
ret += "nivel " + i++ + elem.toString()+"\n";
}
return ret;
}
private Stack<Element> stack=null;
private Stack<Element> getStack(){
if(stack==null){
stack = new Stack<Element>();
}
return stack;
}
public File getCurrent(){
return getStack().peek().getCurrent();
}
public boolean isDirectoryEmpty(){
return getStack().peek().isDirectoryEmpty();
}
public DirectoryStack downLevel(){
getStack().pop();
return this;
}
public DirectoryStack goToParent(){
return downLevel();
}
public DirectoryStack goIntoDir(){
return upLevel();
}
public DirectoryStack upLevel(){
if(isCurrentNotNull())
getStack().push(new Element(getCurrent()));
return this;
}
public DirectoryStack advancePostition(){
getStack().peek().advancePointer();
return this;
}
public File[] peekDirectory(){
return getStack().peek().getFiles();
}
public boolean isLastFileOfDirectory(){
return getStack().peek().getFiles().length <= getStack().peek().getCurrentPointer();
}
public boolean gotMoreLevels() {
return getStack().size()>0;
}
public boolean gotMoreInCurrentLevel() {
return getStack().peek().getFiles().length > getStack().peek().getCurrentPointer()+1;
}
public boolean isRoot() {
return !(getStack().size()>1);
}
public boolean isCurrentNotNull() {
if(!getStack().isEmpty()){
int currentPointer = getStack().peek().getCurrentPointer();
int maxFiles = getStack().peek().getFiles().length;
return currentPointer < maxFiles;
}else{
return false;
}
}
public boolean isCurrentDirectory() {
return getStack().peek().getCurrent().isDirectory();
}
public boolean isLastFromDirList() {
return getStack().peek().getCurrentPointer() == (getStack().peek().getFiles().length-1);
}
public boolean isCantGoParent() {
return !(getStack().size()>1);
}
public boolean isOutOfDirListRange() {
return getStack().peek().getFiles().length <= getStack().peek().getCurrentPointer();
}
}

Using an Iterable doesn't imply that the Files will be streamed to you. In fact its usually the opposite. So an array is typically faster than an Iterable.

Are you sure it's due to Java, not just a general problem with having 10k entries in one directory, particularly over the network?
Have you tried writing a proof-of-concept program to do the same thing in C using the win32 findfirst/findnext functions to see whether it's any faster?
I don't know the ins and outs of SMB, but I strongly suspect that it needs a round trip for every file in the list - which is not going to be fast, particularly over a network with moderate latency.
Having 10k strings in an array sounds like something which should not tax the modern Java VM too much either.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

File.createNewFile() randomly fails - java

Related

FileChannel.open(path, CREATE|CREATE_NEW) without WRITE option throws NoSuchFileException

FileNotFound exception even though file is in the place during watch service in java

Cannot find a image file that exists in java

Can't co-create object / Can't find moniker | Jacob

Is there a workaround for Java's poor performance on walking huge directories?

Categories

Resources