Files downloaded as Binary with Java are corrupted

Files downloaded as Binary with Java are corrupted - java

I have written an downloader which should be used to download text files, as well as images. So I download the files as binaries. Many of the downloads work very well, but some parts of the text files and many image files are corrupted. The errors occur always at the same files and at the same places (as long as I can tell when analysing the text files). I used this code for downloading:
public File downloadFile(HttpURLConnection connection) {
return writeFileDataToFile(getFileData(connection));
}
//downloads the data of the file and returns the content as string
private List<Byte> getFileData(HttpURLConnection connection) {
List<Byte> fileData = new ArrayList<>();
try (InputStream input = connection.getInputStream()) {
byte[] fileChunk = new byte[8*1024];
int bytesRead;
do {
bytesRead = input.read(fileChunk);
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(fileChunk));
fileChunk = new byte[8*1024];
}
} while (bytesRead != -1);
return fileData;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
System.exit(1);
return null; //shouldn't be reached
}
}
//writes data to the file
private File writeFileDataToFile(List<Byte> fileData) {
if (!this.file.exists()) {
try {
this.file.getParentFile().mkdirs();
this.file.createNewFile();
} catch (IOException e) {
System.out.println("Error while creating file at " + file.getPath());
System.exit(1);
}
}
try (OutputStream output = new FileOutputStream(file)) {
output.write(Bytes.toArray(fileData));
return file;
} catch (IOException e) {
System.out.println("Error while accessing file at " + file.getPath());
System.exit(1);
return null;
}
}

I could suggest you to not pass through List of Byte, since you create a list of Byte from an array, to get it back to an array of Byte, which is not really efficient.
Moreover you wrongly assume the chunk size (not necesseraly 8192 bytes).
Why don't you just do something as:
private File writeFileDataToFile(HttpURLConnection connection) {
if (!this.file.exists()) {
try {
this.file.getParentFile().mkdirs();
//this.file.createNewFile(); // not needed, will be created at FileOutputStream
} catch (IOException e) {
System.out.println("Error while creating file at " + file.getPath());
//System.exit(1);
// instead do a throw of error or return null
throw new YourException(message);
}
}
OutputStream output = null;
InputStream input = null;
try {
output = new FileOutputStream(file):
input = connection.getInputStream();
byte[] fileChunk = new byte[8*1024];
int bytesRead;
while ((bytesRead = input.read(fileChunk )) != -1) {
output.write(fileChunk , 0, bytesRead);
}
return file;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
// System.exit(1); // you should avoid such exit
// instead do a throw of error or return null
throw new YourException(message);
} finally {
if (input != null) {
try {
input.close();
} catch (Execption e2) {} // ignore
}
if (output != null) {
try {
output.close();
} catch (Execption e2) {} // ignore
}
}
}

The failure was adding the whole fileChunk Array to file data, even if it wasn't completely filled by the read operation.
Fix:
//downloads the data of the file and returns the content as string
private List<Byte> getFileData(HttpURLConnection connection) {
List<Byte> fileData = new ArrayList<>();
try (InputStream input = connection.getInputStream()) {
byte[] fileChunk = new byte[8*1024];
int bytesRead;
do {
bytesRead = input.read(fileChunk);
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(Arrays.copyOf(fileChunk, bytesRead)));
}
} while (bytesRead != -1);
return fileData;
} catch (IOException e) {
System.out.println("Receiving file at " + url.toString() + " failed");
System.exit(1);
return null; //shouldn't be reached
}
}
Where the relevant change is changing
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(fileChunk));
fileChunk = new byte[8*1024];
}
into
if (bytesRead != -1) {
fileData.addAll(Bytes.asList(Arrays.copyOf(fileChunk, bytesRead)));
}

Related

File md5 hash changes when chunking it (for netty transfer)

Question at the bottom
I'm using netty to transfer a file to another server.
I limit my file-chunks to 1024*64 bytes (64KB) because of the WebSocket protocol. The following method is a local example what will happen to the file:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
while(is.read(buf) > 0) {
os.write(buf);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null && os != null) {
is.close();
os.close();
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
The file is loaded by the InputStream into a ByteBuffer and directly written to the OutputStream.
The content of the file cannot change while this process.
To get the md5-hashes of the file I've wrote the following method:
public static String checksum(File file) {
InputStream is = null;
try {
is = new FileInputStream(file);
MessageDigest digest = MessageDigest.getInstance("MD5");
byte[] buffer = new byte[8192];
int read = 0;
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
return new BigInteger(1, digest.digest()).toString(16);
} catch(IOException | NoSuchAlgorithmException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
is.close();
} catch(IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
return null;
}
So: just in theory it should return the same hash, shouldn't it? The problem is that it returns two different hashes that do not differ with every run.. file size stays the same and the content either.
When I run the method once for in: file-1, out: file-2 and again with in: file-2 and out: file-3 the hashes of file-2 and file-3 are the same! This means the method will properly change the file every time the same way.
1. 58a4a9fbe349a9e0af172f9cf3e6050a
2. 7b3f343fa1b8c4e1160add4c48322373
3. 7b3f343fa1b8c4e1160add4c48322373
Here is a little test that compares all buffers if they are equivalent. Test is positive. So there aren't any differences.
File file1 = new File("controller/templates/Example.zip");
File file2 = new File("controller/templates2/Example.zip");
try {
byte[] buf1 = new byte[1024*64];
byte[] buf2 = new byte[1024*64];
FileInputStream is1 = new FileInputStream(file1);
FileInputStream is2 = new FileInputStream(file2);
boolean run = true;
while(run) {
int read1 = is1.read(buf1), read2 = is2.read(buf2);
String result1 = Arrays.toString(buf1), result2 = Arrays.toString(buf2);
boolean test = result1.equals(result2);
System.out.println("1: " + result1);
System.out.println("2: " + result2);
System.out.println("--- TEST RESULT: " + test + " ----------------------------------------------------");
if(!(read1 > 0 && read2 > 0) || !test) run = false;
}
} catch (IOException e) {
e.printStackTrace();
}
Question: Can you help me chunking the file without changing the hash?

while(is.read(buf) > 0) {
os.write(buf);
}
The read() method with the array argument will return the number of files read from the stream. When the file doesn't end exactly as a multiple of the byte array length, this return value will be smaller than the byte array length because you reached the file end.
However your os.write(buf); call will write the whole byte array to the stream, including the remaining bytes after the file end. This means the written file gets bigger in the end, therefore the hash changed.
Interestingly you didn't make the mistake when you updated the message digest:
while((read = is.read(buffer)) > 0) {
digest.update(buffer, 0, read);
}
You just have to do the same when you "rechunk" your files.

Your rechunk method has a bug in it. Since you have a fixed buffer in there, your file is split into ByteArray-parts. but the last part of the file can be smaller than the buffer, which is why you write too many bytes in the new file. and that's why you do not have the same checksum anymore. the error can be fixed like this:
public static void rechunck(File file1, File file2) {
FileInputStream is = null;
FileOutputStream os = null;
try {
byte[] buf = new byte[1024*64];
is = new FileInputStream(file1);
os = new FileOutputStream(file2);
int length;
while((length = is.read(buf)) > 0) {
os.write(buf, 0, length);
}
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
} finally {
try {
if(is != null)
is.close();
if(os != null)
os.close();
} catch (IOException e) {
Controller.handleException(Thread.currentThread(), e);
}
}
}
Due to the length variable, the write method knows that until byte x of the byte array, only the file is off, then there are still old bytes in it that no longer belong to the file.

Unzipping a directory/file in Java

I have the following code:
public static void unzip(final File archive) throws FileNotFoundException, IOException
{
ZipInputStream zipInput = null;
try
{
zipInput = new ZipInputStream(new FileInputStream(archive));
ZipEntry zipEntry = null;
while ((zipEntry = zipInput.getNextEntry()) != null)
{
String ename = zipEntry.getName();
final int pos = ename.lastIndexOf(File.separatorChar);
if (pos >= 0)
{
ename = ename.substring(pos + 1);
}
final FileOutputStream outputFile = new FileOutputStream(archive.getParent() + File.separatorChar + ename);
int data = 0;
try
{
while ((data = zipInput.read()) != -1)
{
outputFile.write(data);
}
}catch (final Exception e)
{
LOGGER.error( e);
}finally
{
outputFile.close();
}
}
}catch (final Exception e)
{
LOGGER.error("Error when zipping file ( "+archive.getPath()+" )", e);
}finally
{
if(zipInput !=null)
{
zipInput.close();
}
}
}
What I would like to know is, what does it mean when I get the value -1 from the following line:
(data = zipInput.read()) != -1
I'm guessing it's the reason why the zip file is not being unzipped properly.

It's an expected value to be returned by an InputStream which has no content left to read.
From InputStream's javadoc :
Returns:
the next byte of data, or -1 if the end of the stream is reached.

retrieve file from FTP when the directory has too many files

Iam trying to get a list of FTP files by using
FTPFile[] ftpFiles = ftp.listFiles(READ_DIRECTORY);
There are more than 27553 files in this directory and i expect the number to grow.
Now i need to retrieve one file from this huge list. I am doing the following
for (FTPFile ftpFile : ftpFiles)
{
if(fileName.equalsIgnoreCase(ftpFile.getName())
{
print(ftpFile);
}
}
But lets say the file i want to print is the last file in the 27553 files.. it takes about a minute to go through the whole list checking if its the file im looking for.. not only that.. it also gives me a "org.apache.commons.net.ftp.FTPConnectionClosedException: FTP response 421 received. Server closed connection." after about 900 seconds.
How can i tune this program to find the file faster? I dont want it to run for 900 seconds. Below is the actual method that takes so long. Please suggest how i can reduce the time taken. In debug mode, the method runs hundreds of seconds. On a production server, it takes more than a minute or two which is still not acceptable.
private boolean PDFReport(ActionForm form, HttpServletResponse response,
String fileName, String READ_DIRECTORY) throws Exception
{
boolean test = false;
FTPClient ftp = new FTPClient();
DataSourceReader dsr = new DataSourceReader();
dsr.getFtpLinks();
String ftppassword = dsr.getFtppassword();
String ftpserver = dsr.getFtpserver();
String ftpusername = dsr.getFtpusername();
ftp.connect(ftpserver);
ftp.login(ftpusername, ftppassword);
InputStream is = null;
BufferedInputStream bis = null;
try
{
int reply;
reply = ftp.getReplyCode();
if (!FTPReply.isPositiveCompletion(reply))
{
ftp.disconnect();
System.out.println("FTP server refused connection.");
} else
{
ftp.enterLocalPassiveMode();
FTPFile[] ftpFiles = ftp.listFiles(READ_DIRECTORY);
for (FTPFile ftpFile : ftpFiles)
{
String FilePdf = ftpFile.getName();
ftp.setFileType(FTP.BINARY_FILE_TYPE);
if (FilePdf.equalsIgnoreCase(fileName))
{
String strFile = READ_DIRECTORY + "/" + FilePdf;
boolean fileFormatType = fileName.endsWith(".PDF");
if (fileFormatType)
{
if (FilePdf != null && FilePdf.length() > 0)
{
is = ftp.retrieveFileStream(strFile);
bis = new BufferedInputStream(is);
response.reset();
response.setContentType("application/pdf");
response.setHeader("Content-Disposition",
"inline;filename=example.pdf");
ServletOutputStream outputStream = response
.getOutputStream();
byte[] buffer = new byte[1024];
int readCount;
while ((readCount = bis.read(buffer)) > 0)
{
outputStream
.write(buffer, 0, readCount);
}
outputStream.flush();
outputStream.close();
}
} else
{
if (FilePdf != null && FilePdf.length() > 0)
{
is = ftp.retrieveFileStream(strFile);
bis = new BufferedInputStream(is);
response.reset();
response.setContentType("application/TIFF");
response.setHeader("Content-Disposition",
"inline;filename=example.tiff");
ServletOutputStream outputStream = response
.getOutputStream();
byte[] buffer = new byte[1024];
int readCount;
while ((readCount = bis.read(buffer)) > 0)
{
outputStream
.write(buffer, 0, readCount);
}
outputStream.flush();
outputStream.close();
}
}
test = true;
}
if(test) break;
}
}
} catch (Exception ex)
{
ex.printStackTrace();
System.out.println("Exception ----->" + ex.getMessage());
} finally
{
try
{
if (bis != null)
{
bis.close();
}
if (is != null)
{
is.close();
}
} catch (IOException e)
{
e.printStackTrace();
}
try
{
ftp.disconnect();
ftp = null;
} catch (IOException e)
{
e.printStackTrace();
}
}
return test;
}

Why bother iterating through the full list? You already know what the filename you want is, and you use it when you call is = ftp.retrieveFileStream(strFile);. Just call that directly without ever calling listFiles().

I think it has 2 ways depend on how you gonna used it.
instead of using "listFiles" get a list of file name and info, you use "listName" to get only files' name.
String[] ftpFiles = ftp.listName(READ_DIRECTORY);
// do looping for string array
using FTPFileFilter with listFiles
FTPFileFilter filter = new FTPFileFilter() {
#Override
public boolean accept(FTPFile ftpFile) {
return (ftpFile.isFile() &&
ftpFile.getName().equalsIgnoreCase(fileName);
}
};
FTPFile[] ftpFiles= ftp.listFiles(READ_DIRECTORY, filter);
if (ftpFiles!= null && ftpFiles.length > 0) {
for (FTPFile aFile : ftpFiles) {
print(aFile.getName());
}
}

Do not use a for-each loop. Use a regular for loop where you can actually control the direction.
for(int i = ftpFiles.Length-1; i >= 0; i--) {
print(ftpFiles[i])
}

Blocking multithreaded TCP Server NullPointerException debugging

I have a TCP server that accepts data and saves it to a text file. It then uses that text file to create an image and sends it back to the client. Every couple of hours I will get a NullPointerException that gets thrown to every client that connects after that. I am not sure how to go about debugging this as I cannot replicate it on my own.
Does anyone have any debugging practices to help me figure out why this is becoming a problem?
The server running is running Ubuntu 12.04 i386 with 2 gigs of RAM. My initial suspicion is that something is not getting closed properly and creating issues but everything should be getting closed as far as I can tell.
ServerSocket echoServer = null;
Socket clientSocket = null;
try {
echoServer = new ServerSocket(xxx);
} catch (IOException e) {
System.out.println(e);
}
while(true)
{
InputStream is = null;
FileOutputStream fos = null;
BufferedOutputStream bos = null;
int bufferSize = 0;
FileInputStream fis = null;
BufferedInputStream bis = null;
BufferedOutputStream out = null;
try {
//Receieve text file
is = null;
fos = null;
bos = null;
bufferSize = 0;
String uid = createUid();
try {
clientSocket = echoServer.accept();
clientSocket.setKeepAlive(true);
clientSocket.setSoTimeout(10000);
System.out.println("Client accepted from: " + clientSocket.getInetAddress());
} catch (IOException ex) {
System.out.println("Can't accept client connection. ");
}
try {
is = clientSocket.getInputStream();
bufferSize = clientSocket.getReceiveBufferSize();
System.out.println("Buffer size: " + bufferSize);
} catch (IOException ex) {
System.out.println("Can't get socket input stream. ");
}
try {
fos = new FileOutputStream("/my/diretory/" + uid + ".txt");
bos = new BufferedOutputStream(fos);
} catch (FileNotFoundException ex) {
System.out.println("File not found. ");
}
byte[] bytes = new byte[bufferSize];
int count;
while ((count = is.read(bytes)) > 0) {
bos.write(bytes, 0, count);
System.out.println("Receiving... " + count);
}
System.out.println("Done receiving text file");
bos.flush();
bos.close();
fos.close();
//image
String[] command = new String[3];
command[0] = "python";
command[1] = "imagecreationfile.py";
command[2] = uid;
System.out.println("Starting python script");
Boolean success = startScript(command);
if(success)
{
System.out.println("Script completed successfully");
//Send image here
String image = "/my/directory/" + uid + ".png";
File imageFile = new File(image);
long length = imageFile.length();
if (length > Integer.MAX_VALUE) {
System.out.println("File is too large.");
}
bytes = new byte[(int) length];
fis = new FileInputStream(imageFile);
bis = new BufferedInputStream(fis);
out = new BufferedOutputStream(clientSocket.getOutputStream());
count = 0;
while ((count = bis.read(bytes)) > 0) {
out.write(bytes, 0, count);
System.out.println("Writing... " + count);
}
out.flush();
out.close();
fis.close();
bis.close();
}
else
{
System.out.println("Script failed");
}
System.out.println("Closing connection");
is.close();
clientSocket.close();
} catch (Exception e) {
System.out.println(e); //This is where the exception is being caught
}
if(!clientSocket.isClosed())
{
try {
clientSocket.close();
} catch (IOException e) {
e.printStackTrace();
}
}
try {
if(is != null)
is.close();
if(fos != null)
fos.close();
if(bos != null)
bos.close();
if(fis != null)
fis.close();
if(bis != null)
bis.close();
if(out != null)
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

Maybe exception was thrown in one of your try-catch scope.
And the next try-catch scope found null variables.
for example
//Receieve text file
is = null;
fos = null;
bos = null;
bufferSize = 0;
String uid = createUid();
try {
clientSocket = echoServer.accept();
clientSocket.setKeepAlive(true);
clientSocket.setSoTimeout(10000);
System.out.println("Client accepted from: " + clientSocket.getInetAddress());
} catch (IOException ex) {
System.out.println("Can't accept client connection. ");
}
try {
is = clientSocket.getInputStream();
bufferSize = clientSocket.getReceiveBufferSize();
System.out.println("Buffer size: " + bufferSize);
} catch (IOException ex) {
System.out.println("Can't get socket input stream. ");
}
if IOException was thrown in "clientSocket = echoServer.accept();" , it will print "Can't accept client connection. ".
When, "is = clientSocket.getInputStream();" executed, it will throw NullPointer because "clientSocket" was not initialized properly.
My suggestion, dont break a sequenced statement in different try-catch scope until it necessary.

DataOutputStream writing too much

What I currently have
I'm currently trying to create a little download manager in Java and I have a problem with writing the loaded bytes in a file. I'm using a DataOutputStream to write the byte-array which I read from a DataInputStream. Here is the class I created to do that:
public class DownloadThread extends Thread{
private String url_s;
private File datei;
public DownloadThread(String url_s, File datei){
this.url_s = url_s;
this.datei = datei;
}
public void run(){
// Connect:
int size = 0;
URLConnection con = null;
try {
URL url = new URL(url_s);
con = url.openConnection();
size = con.getContentLength();
// Set Min and Max of the JProgressBar
prog.setMinimum(0);
prog.setMaximum(size);
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
// Download:
if (con != null || size != 0){
byte[] buffer = new byte[4096];
DataInputStream down_reader = null;
// Output:
DataOutputStream out = null;
try {
out = new DataOutputStream(new FileOutputStream(datei));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
// Load:
try {
down_reader = new DataInputStream(con.getInputStream());
int byte_counter = 0;
int tmp = 0;
int progress = 0;
// Read:
while (true){
tmp = down_reader.read(buffer);
// Check for EOF
if (tmp == -1){
break;
}
out.write(buffer);
out.flush();
// Set Progress:
byte_counter += tmp;
progress = (byte_counter * 100) / size;
prog.setValue( byte_counter );
prog.setString(progress+"% - "+byte_counter+"/"+size+" Bytes");
}
// Check Filesize:
prog.setString("Checking Integrity...");
if (size == out.size()){
prog.setString("Integrity Check passed!");
} else {
prog.setString("Integrity Check failed!");
System.out.println("Size: "+size+"\n"+
"Read: "+byte_counter+"\n"+
"Written: "+out.size() );
}
// ENDE
} catch (IOException e) {
e.printStackTrace();
} finally{
try {
out.close();
down_reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
// Clean Up...
load.setEnabled(true);
try {
this.join();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
This is currently an inner-class and the prog-Object is a JProgressBar from it's mother-class, so it can be accessed directly.
Example
I'm trying to download the Windows .exe Version of the TinyCC, which should be 281KB size. The file i downloaded with my download manager is 376KB big.
The Output from the Script looks like this:
Size: 287181
Read: 287181
Written: 385024
So it seems that the read bytes match the file-size but there are more bytes written. What am I missing here?

This is wrong:
out.write(buffer);
It should be
out.write(buffer, 0, tmp);
You need to specify how many bytes to write, a read doesn't always read a full buffer.

Memorize this. It is the canonical way to copy a stream in Java.
int count;
while ((count = in.read(buffer)) > 0)
{
out.write(buffer, 0, count);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Files downloaded as Binary with Java are corrupted - java

Related

File md5 hash changes when chunking it (for netty transfer)

Unzipping a directory/file in Java

retrieve file from FTP when the directory has too many files

Blocking multithreaded TCP Server NullPointerException debugging

DataOutputStream writing too much

Categories

Resources