retrieve file from FTP when the directory has too many files

retrieve file from FTP when the directory has too many files - java

Iam trying to get a list of FTP files by using
FTPFile[] ftpFiles = ftp.listFiles(READ_DIRECTORY);
There are more than 27553 files in this directory and i expect the number to grow.
Now i need to retrieve one file from this huge list. I am doing the following
for (FTPFile ftpFile : ftpFiles)
{
if(fileName.equalsIgnoreCase(ftpFile.getName())
{
print(ftpFile);
}
}
But lets say the file i want to print is the last file in the 27553 files.. it takes about a minute to go through the whole list checking if its the file im looking for.. not only that.. it also gives me a "org.apache.commons.net.ftp.FTPConnectionClosedException: FTP response 421 received. Server closed connection." after about 900 seconds.
How can i tune this program to find the file faster? I dont want it to run for 900 seconds. Below is the actual method that takes so long. Please suggest how i can reduce the time taken. In debug mode, the method runs hundreds of seconds. On a production server, it takes more than a minute or two which is still not acceptable.
private boolean PDFReport(ActionForm form, HttpServletResponse response,
String fileName, String READ_DIRECTORY) throws Exception
{
boolean test = false;
FTPClient ftp = new FTPClient();
DataSourceReader dsr = new DataSourceReader();
dsr.getFtpLinks();
String ftppassword = dsr.getFtppassword();
String ftpserver = dsr.getFtpserver();
String ftpusername = dsr.getFtpusername();
ftp.connect(ftpserver);
ftp.login(ftpusername, ftppassword);
InputStream is = null;
BufferedInputStream bis = null;
try
{
int reply;
reply = ftp.getReplyCode();
if (!FTPReply.isPositiveCompletion(reply))
{
ftp.disconnect();
System.out.println("FTP server refused connection.");
} else
{
ftp.enterLocalPassiveMode();
FTPFile[] ftpFiles = ftp.listFiles(READ_DIRECTORY);
for (FTPFile ftpFile : ftpFiles)
{
String FilePdf = ftpFile.getName();
ftp.setFileType(FTP.BINARY_FILE_TYPE);
if (FilePdf.equalsIgnoreCase(fileName))
{
String strFile = READ_DIRECTORY + "/" + FilePdf;
boolean fileFormatType = fileName.endsWith(".PDF");
if (fileFormatType)
{
if (FilePdf != null && FilePdf.length() > 0)
{
is = ftp.retrieveFileStream(strFile);
bis = new BufferedInputStream(is);
response.reset();
response.setContentType("application/pdf");
response.setHeader("Content-Disposition",
"inline;filename=example.pdf");
ServletOutputStream outputStream = response
.getOutputStream();
byte[] buffer = new byte[1024];
int readCount;
while ((readCount = bis.read(buffer)) > 0)
{
outputStream
.write(buffer, 0, readCount);
}
outputStream.flush();
outputStream.close();
}
} else
{
if (FilePdf != null && FilePdf.length() > 0)
{
is = ftp.retrieveFileStream(strFile);
bis = new BufferedInputStream(is);
response.reset();
response.setContentType("application/TIFF");
response.setHeader("Content-Disposition",
"inline;filename=example.tiff");
ServletOutputStream outputStream = response
.getOutputStream();
byte[] buffer = new byte[1024];
int readCount;
while ((readCount = bis.read(buffer)) > 0)
{
outputStream
.write(buffer, 0, readCount);
}
outputStream.flush();
outputStream.close();
}
}
test = true;
}
if(test) break;
}
}
} catch (Exception ex)
{
ex.printStackTrace();
System.out.println("Exception ----->" + ex.getMessage());
} finally
{
try
{
if (bis != null)
{
bis.close();
}
if (is != null)
{
is.close();
}
} catch (IOException e)
{
e.printStackTrace();
}
try
{
ftp.disconnect();
ftp = null;
} catch (IOException e)
{
e.printStackTrace();
}
}
return test;
}

Why bother iterating through the full list? You already know what the filename you want is, and you use it when you call is = ftp.retrieveFileStream(strFile);. Just call that directly without ever calling listFiles().

I think it has 2 ways depend on how you gonna used it.
instead of using "listFiles" get a list of file name and info, you use "listName" to get only files' name.
String[] ftpFiles = ftp.listName(READ_DIRECTORY);
// do looping for string array
using FTPFileFilter with listFiles
FTPFileFilter filter = new FTPFileFilter() {
#Override
public boolean accept(FTPFile ftpFile) {
return (ftpFile.isFile() &&
ftpFile.getName().equalsIgnoreCase(fileName);
}
};
FTPFile[] ftpFiles= ftp.listFiles(READ_DIRECTORY, filter);
if (ftpFiles!= null && ftpFiles.length > 0) {
for (FTPFile aFile : ftpFiles) {
print(aFile.getName());
}
}

Do not use a for-each loop. Use a regular for loop where you can actually control the direction.
for(int i = ftpFiles.Length-1; i >= 0; i--) {
print(ftpFiles[i])
}

Related

Timing requirements to avoid "Unexpected end of ZLIB input stream"

I have a log analyzing tool that needs to grab *.gz files from Linux servers and unzip them on both Linux and Windows clients. I am getting "Unexpected end of ZLIB input stream" in many instances, which I assume is a difference in detail in the files on Linux and Windows.
Below is my function. It's pretty basic. How do I improved it to prevent the EOF error?
The "in" symbol is a FileInputStream that is created when constructing the class that this function is part of.
public void unzip(File fileTo) throws IOException {
OutputStream out = new FileOutputStream(fileTo);
LOGGER.info("Setting up the file for outputstream : "+fileTo);
try {
in = new GZIPInputStream(in);
byte[] buffer = new byte[65536];
int noRead;
while ((noRead = in.read(buffer)) != -1) {
out.write(buffer, 0, noRead);
}
} finally {
try { out.close(); } catch (Exception e) {}
}
}
I changed from the above to this and now it works. It seems that it was trying to load the output stream before it was done loading the input stream.
public void unzip(File fileTo, String f) throws IOException,
EOFException, InterruptedException {
LOGGER.info("Setting up the file for outputstream : "+fileTo);
GZIPInputStream cIn = new GZIPInputStream(new FileInputStream(f));
OutputStream out = new FileOutputStream(fileTo);
fileTo.setReadable(true, false);
fileTo.setWritable(true, false);
byte[] buffer = new byte[65536];
int noRead;
for (int i = 10; i > 0 && cIn.available() == 1; i--) {
Thread.sleep(1000);
}
try {
while ((noRead = cIn.read(buffer)) != -1) {
out.write(buffer, 0, noRead);
}
} finally {
try { out.close();cIn.close();in.close(); } catch (Exception e) {}
}
}

File md5 hash changes when chunking it (for netty transfer)

Question at the bottom
I'm using netty to transfer a file to another server.
I limit my file-chunks to 1024*64 bytes (64KB) because of the WebSocket protocol. The following method is a local example what will happen to the file:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
while(is.read(buf) > 0) {
os.write(buf);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null && os != null) {
is.close();
os.close();
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
The file is loaded by the InputStream into a ByteBuffer and directly written to the OutputStream.
The content of the file cannot change while this process.
To get the md5-hashes of the file I've wrote the following method:
public static String checksum(File file) {
InputStream is = null;
try {
is = new FileInputStream(file);
MessageDigest digest = MessageDigest.getInstance("MD5");
byte[] buffer = new byte[8192];
int read = 0;
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
return new BigInteger(1, digest.digest()).toString(16);
} catch(IOException | NoSuchAlgorithmException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
is.close();
} catch(IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
return null;
}
So: just in theory it should return the same hash, shouldn't it? The problem is that it returns two different hashes that do not differ with every run.. file size stays the same and the content either.
When I run the method once for in: file-1, out: file-2 and again with in: file-2 and out: file-3 the hashes of file-2 and file-3 are the same! This means the method will properly change the file every time the same way.
1. 58a4a9fbe349a9e0af172f9cf3e6050a
2. 7b3f343fa1b8c4e1160add4c48322373
3. 7b3f343fa1b8c4e1160add4c48322373
Here is a little test that compares all buffers if they are equivalent. Test is positive. So there aren't any differences.
File file1 = new File("controller/templates/Example.zip");
File file2 = new File("controller/templates2/Example.zip");
try {
byte[] buf1 = new byte[1024*64];
byte[] buf2 = new byte[1024*64];
FileInputStream is1 = new FileInputStream(file1);
FileInputStream is2 = new FileInputStream(file2);
boolean run = true;
while(run) {
int read1 = is1.read(buf1), read2 = is2.read(buf2);
String result1 = Arrays.toString(buf1), result2 = Arrays.toString(buf2);
boolean test = result1.equals(result2);
System.out.println("1: " + result1);
System.out.println("2: " + result2);
System.out.println("--- TEST RESULT: " + test + " ----------------------------------------------------");
if(!(read1 > 0 && read2 > 0) || !test) run = false;
}
} catch (IOException e) {
e.printStackTrace();
}
Question: Can you help me chunking the file without changing the hash?

while(is.read(buf) > 0) {
os.write(buf);
}
The read() method with the array argument will return the number of files read from the stream. When the file doesn't end exactly as a multiple of the byte array length, this return value will be smaller than the byte array length because you reached the file end.
However your os.write(buf); call will write the whole byte array to the stream, including the remaining bytes after the file end. This means the written file gets bigger in the end, therefore the hash changed.
Interestingly you didn't make the mistake when you updated the message digest:
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
You just have to do the same when you "rechunk" your files.

Your rechunk method has a bug in it. Since you have a fixed buffer in there, your file is split into ByteArray-parts. but the last part of the file can be smaller than the buffer, which is why you write too many bytes in the new file. and that's why you do not have the same checksum anymore. the error can be fixed like this:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
int length;
while((length = is.read(buf)) > 0) {
os.write(buf, 0, length);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null)
is.close();
if(os != null)
os.close();
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
Due to the length variable, the write method knows that until byte x of the byte array, only the file is off, then there are still old bytes in it that no longer belong to the file.

Exception in thread "Thread-9" java.lang.OutOfMemoryError: Java heap space

I have been writing an updater for my game.
It checks a .version file on drop box and compares it to the local .version file.
If there is any link missing from the local version of the file, it downloads the required link one by one.
This is the error that it shows
Exception in thread "Thread-9" java.lang.OutOfMemoryError: Java heap space
at com.fox.listeners.ButtonListener.readFile(ButtonListener.java:209)
at com.fox.listeners.ButtonListener.readFile(ButtonListener.java:204)
at com.fox.listeners.ButtonListener.UpdateStart(ButtonListener.java:132)
at com.fox.listeners.ButtonListener$1.run(ButtonListener.java:58)
It only shows for some computers though and not all of them this is the readFile method
private byte[] readFile(URL u) throws IOException {
return readFile(u, getFileSize(u));
}
private static byte[] readFile(URL u, int size) throws IOException {
byte[] data = new byte[size];
int index = 0, read = 0;
try {
HttpURLConnection conn = null;
conn = (HttpURLConnection) u.openConnection();
conn.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
InputStream is = conn.getInputStream();
progress_a = 0;
progress_b = data.length;
while(index < data.length) {
read = is.read(data, index, size-index);
index += read;
progress_a = index;
}
} catch(Exception e) {
e.printStackTrace();
}
return data;
}
private byte[] readFile(File f) {
byte[] data = null;
try {
data = new byte[(int)f.length()];
#SuppressWarnings("resource")
DataInputStream dis = new DataInputStream(new FileInputStream(f));
dis.readFully(data);
} catch (IOException e) {
e.printStackTrace();
}
return data;
}
This is the main method that is ran
public void UpdateStart() {
System.out.println("Starting Updater..");
if(new File(cache_dir).exists() == false) {
System.out.print("Creating cache dir.. ");
while(new File(cache_dir).mkdir() == false);
System.out.println("Done");
}
try {
version_live = new Version(new URL(version_file_live));
} catch(MalformedURLException e) {
e.printStackTrace();
}
version_local = new Version(new File(version_file_local));
Version updates = version_live.differences(version_local);
System.out.println("Updated");
int i = 1;
try {
byte[] b = null, data = null;
FileOutputStream fos = null;
BufferedWriter bw = null;
for(String s : updates.files) {
if(s.equals(""))
continue;
System.out.println("Reading file "+s);
AppFrame.pbar.setString("Downloading file "+ i + " of "+updates.files.size());
if(progress_b > 0) {
s = s + " " +(progress_a * 1000L / progress_b / 10.0)+"%";
}
b = readFile(new URL(s));
progress_a = 0;
progress_b = b.length;
AppFrame.pbar.setString("Unzipping file "+ i++ +" of "+updates.files.size());
ZipInputStream zipStream = new ZipInputStream(new ByteArrayInputStream(b));
File f = null, parent = null;
ZipEntry entry = null;
int read = 0, entry_read = 0;
long entry_size = 0;
progress_b = 0;
while((entry = zipStream.getNextEntry()) != null)
progress_b += entry.getSize();
zipStream = new ZipInputStream(new ByteArrayInputStream(b));
while((entry = zipStream.getNextEntry()) != null) {
f = new File(cache_dir+entry.getName());
if(entry.isDirectory())
continue;
System.out.println("Making file "+f.toString());
parent = f.getParentFile();
if(parent != null && !parent.exists()) {
System.out.println("Trying to create directory "+parent.getAbsolutePath());
while(parent.mkdirs() == false);
}
entry_read = 0;
entry_size = entry.getSize();
data = new byte[1024];
fos = new FileOutputStream(f);
while(entry_read < entry_size) {
read = zipStream.read(data, 0, (int)Math.min(1024, entry_size-entry_read));
entry_read += read;
progress_a += read;
fos.write(data, 0, read);
}
fos.close();
}
bw = new BufferedWriter(new FileWriter(new File(version_file_local), true));
bw.write(s);
bw.newLine();
bw.close();
}
} catch(Exception e) {
e.printStackTrace();
return;
}
System.out.println(version_live);
System.out.println(version_local);
System.out.println(updates);
CacheUpdated = true;
if(CacheUpdated) {
AppFrame.pbar.setString("All Files are downloaded click Launch to play!");
}
}
I don't get why it is working for some of my players and then some of my other players it does not i have been trying to fix this all day and i am just so stumped at this point but this seems like its the only big issue left for me to fix.

Either increase the memory allocated to your JVM (How can I increase the JVM memory?), or make sure that the file being loaded in memory isn't gigantic (if it is, you'll need to find an alternate solution, or just read chunks of it at a time instead of loading the entire thing in memory).

Do your update in several steps. Here's some pseudo-code with Java 8. It's way shorter than what you wrote because Java has a lot of built-in tools that you re-write much less efficiently.
// Download
Path zipDestination = Paths.get(...);
try (InputStream in = source.openStream()) {
Files.copy(in, zipDestination);
}
// Unzip
try (ZipFile zipFile = new ZipFile(zipDestination.toFile())) {
for (ZipEntry e: Collections.list(zipFile.entries())) {
Path entryDestination = Paths.get(...);
Files.copy(zipFile.getInputStream(e), entryDestination);
}
}
// Done.

Files downloaded as Binary with Java are corrupted

I have written an downloader which should be used to download text files, as well as images. So I download the files as binaries. Many of the downloads work very well, but some parts of the text files and many image files are corrupted. The errors occur always at the same files and at the same places (as long as I can tell when analysing the text files). I used this code for downloading:
public File downloadFile(HttpURLConnection connection) {
return writeFileDataToFile(getFileData(connection));
}
//downloads the data of the file and returns the content as string
private List<Byte> getFileData(HttpURLConnection connection) {
List<Byte> fileData = new ArrayList<>();
try (InputStream input = connection.getInputStream()) {
byte[] fileChunk = new byte[8*1024];
int bytesRead;
do {
bytesRead = input.read(fileChunk);
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(fileChunk));
fileChunk = new byte[8*1024];
}
} while (bytesRead != -1);
return fileData;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
System.exit(1);
return null; //shouldn't be reached
}
}
//writes data to the file
private File writeFileDataToFile(List<Byte> fileData) {
if (!this.file.exists()) {
try {
this.file.getParentFile().mkdirs();
this.file.createNewFile();
} catch (IOException e) {
System.out.println("Error while creating file at " + file.getPath());
System.exit(1);
}
}
try (OutputStream output = new FileOutputStream(file)) {
output.write(Bytes.toArray(fileData));
return file;
} catch (IOException e) {
System.out.println("Error while accessing file at " + file.getPath());
System.exit(1);
return null;
}
}

I could suggest you to not pass through List of Byte, since you create a list of Byte from an array, to get it back to an array of Byte, which is not really efficient.
Moreover you wrongly assume the chunk size (not necesseraly 8192 bytes).
Why don't you just do something as:
private File writeFileDataToFile(HttpURLConnection connection) {
if (!this.file.exists()) {
try {
this.file.getParentFile().mkdirs();
//this.file.createNewFile(); // not needed, will be created at FileOutputStream
} catch (IOException e) {
System.out.println("Error while creating file at " + file.getPath());
//System.exit(1);
// instead do a throw of error or return null
throw new YourException(message);
}
}
OutputStream output = null;
InputStream input = null;
try {
output = new FileOutputStream(file):
input = connection.getInputStream();
byte[] fileChunk = new byte[8*1024];
int bytesRead;
while ((bytesRead = input.read(fileChunk )) != -1) {
output.write(fileChunk , 0, bytesRead);
}
return file;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
// System.exit(1); // you should avoid such exit
// instead do a throw of error or return null
throw new YourException(message);
} finally {
if (input != null) {
try {
input.close();
} catch (Execption e2) {} // ignore
}
if (output != null) {
try {
output.close();
} catch (Execption e2) {} // ignore
}
}
}

The failure was adding the whole fileChunk Array to file data, even if it wasn't completely filled by the read operation.
Fix:
//downloads the data of the file and returns the content as string
private List<Byte> getFileData(HttpURLConnection connection) {
List<Byte> fileData = new ArrayList<>();
try (InputStream input = connection.getInputStream()) {
byte[] fileChunk = new byte[8*1024];
int bytesRead;
do {
bytesRead = input.read(fileChunk);
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(Arrays.copyOf(fileChunk, bytesRead)));
}
} while (bytesRead != -1);
return fileData;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
System.exit(1);
return null; //shouldn't be reached
}
}
Where the relevant change is changing
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(fileChunk));
fileChunk = new byte[8*1024];
}
into
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(Arrays.copyOf(fileChunk, bytesRead)));
}

Extracting a jar from the currently running jar

I'm trying to extract 2 jar files from the currently running jar however they always end up at 2kb even though their sizes are 104kb and 1.7m, Heres what I've got
public static boolean extractFromJar(String fileName, String dest) {
if (Configuration.getRunningJarPath() == null) {
return false;
}
File file = new File(dest + fileName);
if (file.exists()) {
return false;
}
if (file.isDirectory()) {
file.mkdir();
return false;
}
try {
JarFile jar = new JarFile(Configuration.getRunningJarPath());
Enumeration<JarEntry> e = jar.entries();
while (e.hasMoreElements()) {
JarEntry je = e.nextElement();
InputStream in = new BufferedInputStream(jar.getInputStream(je));
OutputStream out = new BufferedOutputStream(
new FileOutputStream(file));
copyInputStream(in, out);
}
return true;
} catch (Exception e) {
Methods.debug(e);
return false;
}
}
private final static void copyInputStream(InputStream in, OutputStream out)
throws IOException {
while (in.available() > 0) {
out.write(in.read());
}
out.flush();
out.close();
in.close();
}

This should work better then relying on InputStream.available() method:
private final static void copyInputStream(InputStream in, OutputStream out)
throws IOException {
byte[] buff = new byte[4096];
int n;
while ((n = in.read(buff)) > 0) {
out.write(buff, 0, n);
}
out.flush();
out.close();
in.close();
}

available() method is not reliable to read data as it is just an estimate, as per its documentation.
You need to depend on read() method until read a non -ve.
byte[] contentBytes = new byte[ 4096 ];
int bytesRead = -1;
while ( ( bytesRead = inputStream.read( contentBytes ) ) > 0 )
{
out.write( contentBytes, 0, bytesRead );
} // while available
You can go through a discussion on what the problems with available() is at here.

I'm not sure about extracting jars, but every jar is actually a zip file, so you can try unzip it.
you can findout about unziping in java here:
How to unzip files recursively in Java?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

retrieve file from FTP when the directory has too many files - java

Why bother iterating through the full list? You already know what the filename you want is, and you use it when you call is = ftp.retrieveFileStream(strFile);. Just call that directly without ever calling listFiles().

Do not use a for-each loop. Use a regular for loop where you can actually control the direction. for(int i = ftpFiles.Length-1; i >= 0; i--) { print(ftpFiles[i]) }

Related

Timing requirements to avoid "Unexpected end of ZLIB input stream"

File md5 hash changes when chunking it (for netty transfer)

Exception in thread "Thread-9" java.lang.OutOfMemoryError: Java heap space

Files downloaded as Binary with Java are corrupted

Extracting a jar from the currently running jar

Categories

Resources