GZIPInputStream throws exception when reading GZIP FIle - java

I am trying to read files from a public anonymous ftp and I am running in to a problem. I can read the plain text files just fine, but when I try to read in gzip files, I get this exception:
Exception in thread "main" java.util.zip.ZipException: invalid distance too far back
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at java_io_FilterInputStream$read.call(Unknown Source)
at GenBankFilePoc.main(GenBankFilePoc.groovy:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
I have tried downloading the file and using a FileInputStream wrapped in a GZIPInputStream and got the exact same problem, so I don't think it is a problem with the FTP client (which is apache).
Here is some test code that reproduces the problem. It is just trying to print to stdout:
FTPClient ftp = new FTPClient();
ftp.connect("ftp.ncbi.nih.gov");
ftp.login("anonymous", "");
InputStream is = new GZIPInputStream(ftp.retrieveFileStream("/genbank/gbbct1.seq.gz"));
try {
byte[] buffer = new byte[65536];
int noRead;
while ((noRead = is.read(buffer)) != 1) {
System.out.write(buffer, 0, noRead);
}
} finally {
is.close();
ftp.disconnect();
}
I cannot find any documentation on why this would be happening, and following it through the code in a debugger is not getting me anywhere. I feel like I am missing something obvious.
EDIT: I manually downloaded the file and read it in with a GZIPInputStream and was able to print it out just fine. I have tried this with 2 different Java FTP Clients

Ah, I found out what was wrong. You have to set the file type to FTP.BINARY_FILE_TYPE so that the SocketInputStream returned from retrieveFileStream is not buffered.
The following code works:
FTPClient ftp = new FTPClient();
ftp.connect("ftp.ncbi.nih.gov");
ftp.login("anonymous", "");
ftp.setFileType(FTP.BINARY_FILE_TYPE);
InputStream is = new GZIPInputStream(ftp.retrieveFileStream("/genbank/gbbct1.seq.gz"));
try {
byte[] buffer = new byte[65536];
int noRead;
while ((noRead = is.read(buffer)) != 1) {
System.out.write(buffer, 0, noRead);
}
} finally {
is.close();
ftp.disconnect();
}
}

You need to first download the file completely before, since ftp.retrieveFileStream() doesn't support file seeking.
Your code should be:
FTPClient ftp = new FTPClient();
ftp.connect("ftp.ncbi.nih.gov");
ftp.login("anonymous", "");
File downloaded = new File("");
FileOutputStream fos = new FileOutputStream(downloaded);
ftp.retrieveFile("/genbank/gbbct1.seq.gz", fos);
InputStream is = new GZIPInputStream(new FileInputStream(downloaded));
try {
byte[] buffer = new byte[65536];
int noRead;
while ((noRead = is.read(buffer)) != 1) {
System.out.write(buffer, 0, noRead);
}
} finally {
is.close();
ftp.disconnect();
}

Related

FTP downloaded file, error when uncompressing

I am downloading a few .gz file from an FTP server and un-compressing the file to read the data. I am getting the following error.
java.io.IOException: Corrupt GZIP trailer
at java.util.zip.GZIPInputStream.readTrailer(GZIPInputStream.java:200)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:92)
at java.io.FilterInputStream.read(FilterInputStream.java:90)
at com.omnesys.xdk.ClsXDKRTWeb.UnGunZip(ClsXDKRTWeb.java:961)
at com.omnesys.xdk.ClsXDKRTWeb.DeCompress(ClsXDKRTWeb.java:857)
at com.omnesys.xdk.ClsXDKRTWeb.FTPDownloadProcess(ClsXDKRTWeb.java:629)
at com.omnesys.xdk.ClsXDKRTWeb.ProcessRequestXML(ClsXDKRTWeb.java:460)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.jboss.as.ee.component.ManagedReferenceMethodInterceptorFactory$ManagedReferenceMethodInterceptor.processInvocation(ManagedReferenceMethodInterceptorFactory.java:72)
The code for the fTP download and un compressing is as follows.
FTPClient ftp;
FTPClientConfig config;
ftp = new FTPClient();
config = new FTPClientConfig();
ftp.configure(config);
ftp.connect(strFTPServername);
ftp.user(strFTPUserName);
ftp.pass(strFTPUserPwd);
ftp.setFileType(FTP.BINARY_FILE_TYPE);
OutputStream local = new BufferedOutputStream(new FileOutputStream(strCmnDwnldPath));
ftp.retrieveFile(strSrcFilePath, local);
local.close();
if (!FTPReply.isPositiveCompletion(ftp.getReplyCode())) {
ftp.disconnect();
return false;
} else {
ftp.logout()
}
private boolean UnGunZip(String filename, String outputFolder) {
byte[] buffer = new byte[1024];
try {
String sfilename = new File(filename).getName();
sfilename = sfilename.substring(0, sfilename.indexOf(".gz"));
FileInputStream fileIn = new FileInputStream(filename);
GZIPInputStream gZIPInputStream = new GZIPInputStream(fileIn);
FileOutputStream fileOutputStream = new FileOutputStream(outputFolder + File.separator + sfilename);
int count;
while ((count = gZIPInputStream.read(buffer)) > 0) {
fileOutputStream.write(buffer, 0, count);
}
gZIPInputStream.close();
fileOutputStream.close();
} catch (IOException ex) {
ex.printStackTrace();
return false;
}
return true;
}
My application runs in Linux environment. When i try to extract the file in the windows environment i get the error saying the file is broken.
When I try to download the same file from windows environment I don't face this issue.
Can someone help me fix this.
[EDIT:] i found this question, according to this the file should be uploaded as ASCII and downloaded as ASCII. But how do i find out if the file was uploaded using ASCII transfer?
Try to remove the "BufferedOutputStream"
OutputStream local = new BufferedOutputStream(new FileOutputStream(strCmnDwnldPath));
This should be enough:
OutputStream local = new FileOutputStream(strCmnDwnldPath);

Unzip response stream from zipped with GZipStream (C#) in GZIPInputStream

I am trying to unzip a response from a .net middleware. The response has been ziped using GZipStream.
GZipStream zipStream = new GZipStream(fileStream, CompressionMode.Compress, true);
when I used GZIPInputStream in java to unzip the file. I am getting an IOException with message "not in zip format" in the following code.
GZIPInputStream gzin = new GZIPInputStream(response);
I tried this too.
ByteArrayInputStream memstream = new ByteArrayInputStream(buffer2);
GZIPInputStream gzin = new GZIPInputStream(memstream);
Any help or suggestions are welcomed.
Thanks in advance
Try something like this
GZIPInputStream gis = null;
FileOutputStream fos = null;
try {
gis = new GZIPInputStream(new FileInputStream("pathOfTheGZipFile"));
fos = new FileOutputStream("pathOfDecompressedFile");
byte[] buffer = new byte[gis.available()];
int len;
while((len = gis.read(buffer)) != -1){
fos.write(buffer, 0, len);
}
} catch (IOException e) {
e.printStackTrace();
}
finally {
fos.close();
gis.close();
}
I have finally figured out the solution, In the response returned by the server first few bytes were not zipped so if any one is facing the same issue you just need to check the bytes. After I removed those bytes from response. It started working.

Read first bytes of a file

I need a very simple function that allows me to read the first 1k bytes of a file through FTP. I want to use it in MATLAB to read the first lines and, according to some parameters, to download only files I really need eventually. I found some examples online that unfortunately do not work. Here I'm proposing the sample code where I'm trying to download one single file (I'm using the Apache libraries).
FTPClient client = new FTPClient();
FileOutputStream fos = null;
try {
client.connect("data.site.org");
// filename to be downloaded.
String filename = "filename.Z";
fos = new FileOutputStream(filename);
// Download file from FTP server
InputStream stream = client.retrieveFileStream("/pub/obs/2008/021/ab120210.08d.Z");
byte[] b = new byte[1024];
stream.read(b);
fos.write(b);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (fos != null) {
fos.close();
}
client.disconnect();
} catch (IOException e) {
e.printStackTrace();
}
}
the error is in stream which is returned empty. I know I'm passing the folder name in a wrong way, but I cannot understand how I have to do. I've tried in many way.
I've also tried with the URL's Java classes as:
URL url;
url = new URL("ftp://data.site.org/pub/obs/2008/021/ab120210.08d.Z");
URLConnection con = url.openConnection();
BufferedInputStream in =
new BufferedInputStream(con.getInputStream());
FileOutputStream out =
new FileOutputStream("C:\\filename.Z");
int i;
byte[] bytesIn = new byte[1024];
if ((i = in.read(bytesIn)) >= 0) {
out.write(bytesIn);
}
out.close();
in.close();
but it is giving an error when I'm closing the InputStream in!
I'm definitely stuck. Some comments about would be very useful!
Try this test
InputStream is = new URL("ftp://test:test#ftp.secureftp-test.com/bookstore.xml").openStream();
byte[] a = new byte[1000];
int n = is.read(a);
is.close();
System.out.println(new String(a, 0, n));
it definitely works
From my experience when you read bytes from a stream acquired from ftpClient.retrieveFileStream, for the first run it is not guarantied that you get your byte buffer filled up. However, either you should read the return value of stream.read(b); surrounded with a cycle based on it or use an advanced library to fill up the 1024 length byte[] buffer:
InputStream stream = null;
try {
// Download file from FTP server
stream = client.retrieveFileStream("/pub/obs/2008/021/ab120210.08d.Z");
byte[] b = new byte[1024];
IOUtils.read(stream, b); // will call periodically stream.read() until it fills up your buffer or reaches end-of-file
fos.write(b);
} catch (IOException e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(inputStream);
}
I cannot understand why it doesn't work. I found this link where they used the Apache library to read 4096 bytes each time. I read the first 1024 bytes and it works eventually, the only thing is that if completePendingCommand() is used, the program is held for ever. Thus I've removed it and everything works fine.

InputStream - Dealing with network changes

I'm downloading an attachment using Java mail API and whenever there is a small change in network state, my app gets stuck and I have to restart it, it's not even crashing.
This is the code snippet:
InputStream is = bodyPart.getInputStream();
String fileName = MimeUtility.decodeText(bodyPart.getFileName());
// Downloading the file
File f = new File(Constants.getPath() + fileName);
try {
FileOutputStream fos;
fos = new FileOutputStream(f);
byte[] buf = new byte[8*1024];
int bytesRead;
while ((bytesRead = is.read(buf)) != -1) {
fos.write(buf, 0, bytesRead);
}
fos.close();
}
What is the best way to deal with this issue? Thanks.
Your application is stuck. The solution to that is to set a read timeout, as discussed in this question. If the timeout occurs a SocketTimeoutException will be thrown.

Java request file, send file (Client-server)

I'm making a Client-Server. I've gotten as far as that the server can send a hardcoded file, but not a client specified. I will have to send only text files. As far as I have understood: the clients firstly sends the file name and then, the server sends it, nothing complicated, but I'm getting all kinds of errors, this code is getting a connection reset/socket closed error. The main problem is, that hadn't got much time to research networking.
Ill appreciate any help I can get.
EDIT.
I found a work around, closing a stream causes the socket to close, why is that? It shouldn't happen, should it?
Server Side:
InputStream sin=newCon.getInputStream();
DataInputStream sdata=new DataInputStream(sin);
location=sdata.readUTF();
//sdata.close();
//sin.close();
File toSend=new File(location);
byte[] array=new byte[(int)toSend.length()];
FileInputStream fromFile=new FileInputStream(toSend);
BufferedInputStream toBuffer=new BufferedInputStream(fromFile);
toBuffer.read(array,0,array.length);
OutputStream out=newCon.getOutputStream(); //Socket-closed...
out.write(array,0,array.length);
out.flush();
toBuffer.close();
newCon.close();
ClientSide:
int bytesRead;
server=new Socket(host,port);
OutputStream sout=server.getOutputStream();
DataOutputStream sdata=new DataOutputStream(sout);
sdata.writeUTF(interestFile);
//sdata.close();
//sout.close();
InputStream in=server.getInputStream(); //socket closed..
OutputStream out=new FileOutputStream("data.txt");
byte[] buffer=new byte[1024];
while((bytesRead=in.read(buffer))!=-1)
{
out.write(buffer,0,bytesRead);
}
out.close();
server.close();
Try reading the file in chunks from Server while writing to client output stream rather than creating a temp byte array and reading entire file into memory. What if requested file is large? Also close the new Socket on server-side in a finally block so socket is closed even if an exception is thrown.
Server Side:
Socket newCon = ss.accept();
FileInputStream is = null;
OutputStream out = null;
try {
InputStream sin = newCon.getInputStream();
DataInputStream sdata = new DataInputStream(sin);
String location = sdata.readUTF();
System.out.println("location=" + location);
File toSend = new File(location);
// TODO: validate file is safe to access here
if (!toSend.exists()) {
System.out.println("File does not exist");
return;
}
is = new FileInputStream(toSend);
out = newCon.getOutputStream();
int bytesRead;
byte[] buffer = new byte[4096];
while ((bytesRead = is.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
out.flush();
} finally {
if (out != null)
try {
out.close();
} catch(IOException e) {
}
if (is != null)
try {
is.close();
} catch(IOException e) {
}
newCon.close();
}
If you use Apache Common IOUtils library then you can reduce much of the code to read/write files to streams. Here 5-lines down to one line.
org.apache.commons.io.IOUtils.copy(is, out);
Note that having a server that serves files by absolute path to remote clients is potentially dangerous and the target file should be restricted to a given directory and/or set of file types. Don't want to serve out system-level files to unauthenticated clients.

Categories

Resources