Unexpected end of ZLIB input stream in java - java

public class GzExtractor implements Extractor {
Logger logger = LoggerFactory.getLogger(GzExtractor.class);
private static final int BUFFER_SIZE = 1024;
byte[] buff = new byte[BUFFER_SIZE];
private File file;
private String destinationPath;
public GzExtractor(File file, String destinationPath) {
this.file = file;
this.destinationPath = destinationPath;
}
public void extract() {
try {
File destDir = new File(destinationPath);
if (!destDir.exists()) {
destDir.mkdir();
}
GZIPInputStream gZipObj = new GZIPInputStream(new FileInputStream(file));
String extractedFilename = file.getName().split(".gz")[0];
OutputStream fosObj = new FileOutputStream(destinationPath + extractedFilename);
int len;
while ((len = gZipObj.read(buff)) > 0) {
fosObj.write(buff, 0, len);
}
gZipObj.close();
fosObj.close();
} catch (Exception e) {
logger.info("GZ Exception : {}",e.getMessage());
}
}
}
I'm getting the error of unexpected ZLIB stream but the file is extracted successfully.
I tried some solutions but none of them solved this. I tried closing the gzip stream before reading as I found that from one of the answers here. But that throws another error of course.
I am confused why I'm getting this and I want to basically eliminate the error statement.
[pool-1-thread-1] INFO service.ExtractorImpl.GzExtractor - GZ Exception : Unexpected end of ZLIB input stream

Okay so probably the compressed file was not in the correct format. I was using an FTP server where I was uploading different kinds of files i.e zip, gzip, CSV, etc. It was in my logic that the decompression would occur on the file according to the compression type of the file. While downloading from my FTP server, I forgot to mention the type of file that must be binary to include zip files.
ftpClient.setFileType(FTP.BINARY_FILE_TYPE);
So the file which was being decompressed must not be in a correct format. Maybe that is why I was getting this error.
After setting the file type to this, it worked okay.

Related

The file gets smaller after reading the jar package and writing it to another file

package com.example.demo.Util;
public class Test {
static HashMap<String,String> map = new HashMap<>();
public static void main(String[] args) throws IOException {
String data = "12j3h1i7tsa7sgdajk123y8asd: 88888";
File jarFile = new File(new Test().getJarPath());
File tempJar = upJarFile(jarFile, "BOOT-INF/classes/application.properties", data);
}
public static File upJarFile(File originalJarFile, String editFilePath, String content) throws IOException {
File tempFile = File.createTempFile("temp", ".jar");
JarFile jarFile = new JarFile(originalJarFile);
Enumeration<JarEntry> entries = jarFile.entries();
System.out.println("before:"+ originalJarFile.length());
JarOutputStream jarOutputStream = new JarOutputStream(new FileOutputStream(tempFile));
while (entries.hasMoreElements()) {
JarEntry jarEntry = entries.nextElement();
jarOutputStream.putNextEntry(jarEntry);
map.put(jarEntry.getName(), String.valueOf(jarEntry.getSize()));
jarOutputStream.write(new Test().inputStreamToByteArray(jarFile.getInputStream(jarEntry)));
}
jarOutputStream.finish();
jarOutputStream.close();
System.out.println(tempFile.getPath());
System.out.println("after:" + tempFile.length());
return tempFile;
}
public String getJarPath() {
String path1 = System.getProperty("user.dir");
File file = new File(path1 + "/target/");
String jarFile = null;
for (File file1 : file.listFiles()) {
if (file1.getName().endsWith(".jar")) {
jarFile = file1.getPath();
break;
}
}
return jarFile;
}
public byte[] inputStreamToByteArray(InputStream inputStream) {
try (ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) {
byte[] buffer = new byte[1024];
int num;
while ((num = inputStream.read(buffer)) != -1) {
byteArrayOutputStream.write(buffer, 0, num);
}
byteArrayOutputStream.flush();
return byteArrayOutputStream.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return new byte[]{};
}
}
As shown in the code above,I just turn the incoming jar packages into streams and write them one by one,But it got smaller when I tested the size of the input package and the size of the output temporary package(before:49651057-->after:49647985)
What could be causing this difference?
This can happen due to a number of reasons:
The original JAR file was created with a compression level that is not as high as the default compression level, so the JAR file that you create (with default compression) achieves better compression, and therefore it is smaller. You can verify this by opening both the original and the result JAR files with a ZIP utility (e.g. 7Zip) and examining their checksums and their compressed sizes. If the checksums are identical, but the compressed sizes differ, then the difference is simply due to better compression.
The original JAR file contains unused data. This can happen when sloppy archive creation software updates an archive by appending to it instead of rewriting it from scratch. You can verify this by opening the original ZIP archive with a ZIP utility (e.g. 7Zip) and saving it under a new filename. If the new file is smaller, then the original file contained some unused data.
The original JAR file contains files in subdirectories, which you are not checking. Thus, your output JAR file does not contain all of the files in the original. To fix this, you need to check each entry with jarEntry.isDirectory() and if so, recurse.

Split/join a binary file into multiple parts without loading file into memory?

In Java, how do you split a binary file into multiple parts while only loading a small portion of the File into memory at one time?
So I have a file FullFile that is large. I need to upload it to cloud storage but it's so large that it often times out.
I can make this problem less likely if I split the file and upload in chunks.
So I need to split FullFile into files of chunk size MaxChunkSize.
List<File> fileSplit(File fullFile, int maxChunkSize)
File fileJoin(List<File> splitFiles)
Most code snippets around require the file to be text. But in my case the files are compressed binary.
What would be the best way to implement these methods?
Below is the full answer:
The maxChunkSize represents the size in bytes of a file chunk.
In the example below I read a 5mb zip file and split it into five 1MB chunks and later join them back using the fileJoin function.
The method stageLocally stages the files locally but you can modify it to work with any cloud storage. (Better to abstract this out so you can switch between multiple storage implementations)
You can tweak maxChunkSize based on the amount of data you want to store inmemory at a given time
The IOutils.copy() methods is from the commons library, here is the maven link. You can also use Files.copy() in liue of it. The Files.copy() methods comes from the java.nio package, so you don't have to add an external dependency to use it.
I have ommitted the exception handling for brevity.
public static void main(String[] args) throws IOException {
File input = new File(_5_MB_FILE_PATH);
File outPut = fileJoin(split(input, 1_024_000));
System.out.println(IOUtils.contentEquals(Files.newInputStream(input.toPath()), Files.newInputStream(outPut.toPath())));
}
public static List<File> split(File largeFile, int maxChunkSize) throws IOException {
InputStream in = Files.newInputStream(largeFile.toPath());
List<File> list = new ArrayList<>();
final byte[] buffer = new byte[maxChunkSize];
int dataRead = in.read(buffer);
while (dataRead > -1) {
list.add(stageLocally(buffer, dataRead));
dataRead = in.read(buffer);
}
return list;
}
private static File stageLocally(byte[] buffer, int length) throws IOException {
File outPutFile = File.createTempFile("temp-", "split", new File(TEMP_DIRECTORY));
FileOutputStream fos = new FileOutputStream(outPutFile);
fos.write(buffer, 0, length);
fos.close();
return outPutFile;
}
public static File fileJoin(List<File> list) throws IOException {
File outPutFile = File.createTempFile("temp-", "unsplit", new File(TEMP_DIRECTORY));
FileOutputStream fileOutputStream = new FileOutputStream(outPutFile);
for (File file : list) {
InputStream in = Files.newInputStream(file.toPath());
IOUtils.copy(in, fileOutputStream);
in.close();
}
fileOutputStream.close();
return outPutFile;
}
Let me know if this helps.

Detecting File extension using ApacheTika corrupts the File

I am trying to detect the File Extension of a file passed as an InputStream, the extension is detected correctly but the file tends to become corrupted after that. Here is my method for detecting Extension -
public static Optional<String> detectFileExtension(InputStream inputStream) {
// To provide mark/reset functionality to the stream required by Tika.
InputStream bufferedInputStream = new BufferedInputStream(inputStream);
String extension = null;
try {
MimeTypes mimeRepository = getMimeRepository();
MediaType mediaType = mimeRepository.detect(bufferedInputStream, new Metadata());
MimeType mimeType = mimeRepository.forName(mediaType.toString());
extension = mimeType.getExtension();
log.info("File Extension detected: {}", extension);
// Need to reset input stream pos marker since it was updated while detecting the extension
inputStream.reset();
bufferedInputStream.close();
} catch (MimeTypeException | IOException ignored) {
log.error("Unable to detect extension of the file from the provided stream");
}
return Optional.ofNullable(extension);
}
private static MimeTypes getMimeRepository() {
TikaConfig config = TikaConfig.getDefaultConfig();
return config.getMimeRepository();
}
Now when I am trying to save this file after extension detection again using the same InputStream like -
byte[] documentContentByteArray = IOUtils.toByteArray(inputStream);
Optional<String> extension = FileTypeHelper.detectFileExtension(inputStream);
if (extension.isPresent()) {
fileName = fileName + extension.get();
} else {
log.warn("File: {} does not have a valid extension", fileName);
}
File file = new File("/tmp/" + fileName);
FileUtils.writeByteArrayToFile(file, documentContentByteArray);
It creates a file but a corrupted one. I guess after stream consumption in detectFileExtension the stream is not getting reset properly. If someone has done this before some guidance would be great, thanks in advance.
I fixed it by not using the same input stream again and again.
I created a new stream to pass for extension detection and the initial stream for creating the file.
byte[] documentContentByteArray = IOUtils.toByteArray(inputStream);
//extension detection
InputStream extensionDetectionInputStream = new ByteArrayInputStream(documentContentByteArray);
Optional<String> extension = FileTypeHelper.detectFileExtension(inputStream);
if (extension.isPresent()) {
fileName = fileName + extension.get();
} else {
log.warn("File: {} does not have a valid extension", fileName);
}
extensionDetectionInputStream.close();
//File creation
File file = new File("/tmp/" + fileName);
FileUtils.writeByteArrayToFile(file, documentContentByteArray);
If there is a better way to do that by reusing the same stream it would be great and I'll gladly accept that answer, for now, I am marking this as the accepted answer.

Creation gzip archive using Apache Commons Compress

I succeed to create gz archive with expected content, but how can I set the filename inside the archive?
I mean, if archive myfile.gz was created, the file inside it will be named "myfile", but I want to name it like source file, for example, "1.txt"
Current code:
public static void gz() throws FileNotFoundException, IOException {
GZIPOutputStream out = null;
String filePaths[] = {"C:/Temp/1.txt","C:/Temp/2.txt"};
try {
out = new GZIPOutputStream(
new BufferedOutputStream(new FileOutputStream("C:/Temp/myfile.gz")));
RandomAccessFile f = new RandomAccessFile(filePaths[0], "r");
byte[] b = new byte[(int)f.length()];
f.read(b);
out.write(b, 0, b.length);
out.finish();
out.close();
} finally {
if(out != null) out.close();
}
}
GZip compresses a stream. Typically, when people use GZip with multiple files, they also use tar to munch them together.
gzip archive with multiple files inside

How do i get a filename of a file inside a gzip in java?

int BUFFER_SIZE = 4096;
byte[] buffer = new byte[BUFFER_SIZE];
InputStream input = new GZIPInputStream(new FileInputStream("a_gunzipped_file.gz"));
OutputStream output = new FileOutputStream("current_output_name");
int n = input.read(buffer, 0, BUFFER_SIZE);
while (n >= 0) {
output.write(buffer, 0, n);
n = input.read(buffer, 0, BUFFER_SIZE);
}
}catch(IOException e){
System.out.println("error: \n\t" + e.getMessage());
}
Using the above code I can succesfully extract a gzip's contents although the extracted file's filenames are, as expected, will always be current_output_name (I know its because I declared it to be that way in the code). My problem is I dont know how to get the file's filename when it is still inside the archive.
Though, java.util.zip provides a ZipEntry, I couldn't use it on gzip files.
Any alternatives?
as i kinda agree with "Michael Borgwardt" on his reply, but it is not entirely true, gzip file specifications contains an optional file name stored in the header of the gz file, sadly there are no way (as far as i know ) of getting that name in current java (1.6). as seen in the implementation of the GZIPInputStream in the method getHeader in the openjdk
they skip reading the file name
// Skip optional file name
if ((flg & FNAME) == FNAME) {
while (readUByte(in) != 0) ;
}
i have modified the class GZIPInputStream to get the optional filename out of the gzip archive(im not sure if i am allowed to do that) (download the original version from here), you only need to add a member String filename; to the class, and modify the above code to be :
// Skip optional file name
if ((flg & FNAME) == FNAME) {
filename= "";
int _byte = 0;
while ((_byte= readUByte(in)) != 0){
filename += (char)_byte;
}
}
and it worked for me.
Apache Commons Compress offers two options for obtaining the filename:
With metadata (Java 7+ sample code)
try ( //
GzipCompressorInputStream gcis = //
new GzipCompressorInputStream( //
new FileInputStream("a_gunzipped_file.gz") //
) //
) {
String filename = gcis.getMetaData().getFilename();
}
With "the convention"
String filename = GzipUtils.getUnCompressedFilename("a_gunzipped_file.gz");
References
Apache Commons Compress
GzipCompressorInputStream
See also: GzipUtils#getUnCompressedFilename
Actually, the GZIP file format, using the multiple members, allows the original filename to be specified. Including a member with the FLAG of FLAG.FNAME the name can be specified. I do not see a way to do this in the java libraries though.
http://www.gzip.org/zlib/rfc-gzip.html#specification
following the answers above, here is an example that creates a file "myTest.csv.gz" that contains a file "myTest.csv", notice that you can't change the internal file name, and you can't add more files into the gz file.
#Test
public void gzipFileName() throws Exception {
File workingFile = new File( "target", "myTest.csv.gz" );
GZIPOutputStream gzipOutputStream = new GZIPOutputStream( new FileOutputStream( workingFile ) );
PrintWriter writer = new PrintWriter( gzipOutputStream );
writer.println("hello,line,1");
writer.println("hello,line,2");
writer.close();
}
Gzip is purely compression. There is no archive, it's just the file's data, compressed.
The convention is for gzip to append .gz to the filename, and for gunzip to remove that extension. So, logfile.txt becomes logfile.txt.gz when compressed, and again logfile.txt when it's decompressed. If you rename the file, the name information is lost.

Categories

Resources