Java 7zip compression is too big - java

I have a Java program which searches for a folder with the date of yesterday and compresses it to a 7zip file and deletes it at the end. Now I have noticed that the generated 7zip archive files by my program are way too big. When I use a program like 7-Zip File Manager to compress my files it generates an archive which is 5 kb big while my program generates an archive which is 737 kb big for the same files (which have a 873 kb size). Now I am afraid that my program does not compress it to a 7zip file but do a usual zip file. Is there a way to change something in my code so that it generates a smaller 7zip file like 7-Zip File Manager would do it?
package SevenZip;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.concurrent.TimeUnit;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZOutputFile;
public class SevenZipUtils {
public static void main(String[] args) throws InterruptedException, IOException {
String sourceFolder = "C:/Users/Ferid/Documents/Dates/";
String outputZipFile = "/Users/Ferid/Documents/Dates";
int sleepTime = 0;
compress(sleepTime, outputZipFile, sourceFolder);
}
public static boolean deleteDirectory(File directory, int sleepTime) throws InterruptedException {
if (directory.exists()) {
File[] files = directory.listFiles();
if (null != files) {
for (int i = 0; i < files.length; i++) {
if (files[i].isDirectory()) {
deleteDirectory(files[i], sleepTime);
System.out.println("Folder deleted: " + files[i]);
} else {
files[i].delete();
System.out.println("File deleted: " + files[i]);
}
}
}
}
TimeUnit.SECONDS.sleep(sleepTime);
return (directory.delete());
}
public static void compress(int sleepTime, String outputZipFile, String sourceFolder)
throws IOException, InterruptedException {
// finds folder of yesterdays date
final Calendar cal = Calendar.getInstance();
cal.add(Calendar.DATE, -1); // date of yesterday
String timeStamp = new SimpleDateFormat("yyyyMMdd").format(cal.getTime()); // format the date
System.out.println("Yesterday was " + timeStamp);
if (sourceFolder.endsWith("/")) { // add yesterday folder to sourcefolder path
sourceFolder = sourceFolder + timeStamp;
} else {
sourceFolder = sourceFolder + "/" + timeStamp;
}
if (outputZipFile.endsWith("/")) { // add yesterday folder name to outputZipFile path
outputZipFile = outputZipFile + " " + timeStamp + ".7z";
} else {
outputZipFile = outputZipFile + "/" + timeStamp + ".7z";
}
File file = new File(sourceFolder);
if (file.exists()) {
try (SevenZOutputFile out = new SevenZOutputFile(new File(outputZipFile))) {
addToArchiveCompression(out, file, ".");
System.out.println("Files sucessfully compressed");
deleteDirectory(new File(sourceFolder), sleepTime);
}
} else {
System.out.println("Folder does not exist");
}
}
private static void addToArchiveCompression(SevenZOutputFile out, File file, String dir) throws IOException {
String name = dir + File.separator + file.getName();
if (file.isFile()) {
SevenZArchiveEntry entry = out.createArchiveEntry(file, name);
out.putArchiveEntry(entry);
FileInputStream in = new FileInputStream(file);
byte[] b = new byte[1024];
int count = 0;
while ((count = in.read(b)) > 0) {
out.write(b, 0, count);
}
out.closeArchiveEntry();
in.close();
System.out.println("File added: " + file.getName());
} else if (file.isDirectory()) {
File[] children = file.listFiles();
if (children != null) {
for (File child : children) {
addToArchiveCompression(out, child, name);
}
}
System.out.println("Directory added: " + file.getName());
} else {
System.out.println(file.getName() + " is not supported");
}
}
}
I am using the Apache Commons Compress library
EDIT: Here is a link where I have some of the Apache Commons Compress code from.

Commons Compress is starting a new block in the container file for each archive entry. Note the block counter here:
Not quite the answer you were hoping for, but the docs say it doesn't support "solid compression" - writing several files to a single block. See paragraph 5 in the docs here.
A quick look around found a few other Java libraries that support LZMA compression, but I couldn't spot one that could do so within the parent container file format for 7-Zip. Perhaps someone else knows of an alternative...
It sounds like a normal zip file format (e.g. via ZipOutputStream) is not an option?

Use 7-Zip file archiver instead, it compresses 832 KB file to 26.0 KB easily:
Get its Jar and SDK.
Choose LZMA Compression .java related files.
Add Run arguments to project properties: e "D:\\2017ASP.pdf" "D:\\2017ASP.7z", e stands for encode, "input path" "output path".
Run the project [LzmaAlone.java].
Results
Case1 (.pdf file ):
From 33,969 KB to 24,645 KB.
Case2 (.docx file ):
From 832 KB to 26.0 KB.

I don't have enough rep to comment anymore so here are my thoughts:
I don't see where you set the compression ratio so it could be that SevenZOutputFile uses no (or very low) compression. As #CristiFati said, the difference in compression is odd, especially for text files
As noted by #df778899, there is no support for solid compression, which is how the best compression ratio is achieved, so you won't be able to do as well as the 7z command line
That said, if zip really isn't an option, your last resort could be to call the proper command line directly within your program.
If pure 7z is not mandatory, another option would be to use a "tgz"-like format to emulate solid compression: first compress all files to a non-compressed file (e.g. tar format, or zip file with no compression), then compress that single file in zip mode with standard Java Deflate algorithm. Of course that will be viable only if that format is recognized by further processes using it.

Related

Java loses the last "/" character of the path

I am developing a java application to perform operations with files.
In particular, I perform move and copy of files .. and I have programmed two functions.
Functions take strings such as sourcePath and targetPath as parameters.
I am developing on a mac, and I have given 777 permissions to the folders I need.
But I have the problem, that when I pass paths to the copyFile and moveFile functions I lose the last "/" of the path and consequently get a java.nio.File: NoSuchFileException exception.
I have read both the Java and online documentation but have not found any answers.
I accept any suggestion or advice ... I just add that by manually forcing the path inside the function, then not passing sourcePath and targetPath, the two functions behave as they should.
copyFile:
public static boolean copyFile(String sourcePath, String targetPath) throws IOException {
boolean fileCopied = true;
// if i pass sourcePath i lost the last /
File dirFiles = new File("/Users/myname/Documents/deleghe/remote/F24_CT/deleghe_da_inviare_a_icbpi/");
File[] listOfFiles = dirFiles.listFiles();
String dest = "/Users/myname/Documents/deleghe/local/F24_CT/deleghe_da_inviare_a_icbpi/";
for (File file : listOfFiles) {
Files.copy(file.toPath(),
(new File(dest + file.getName())).toPath(),
StandardCopyOption.REPLACE_EXISTING);
}
return fileCopied;
}
moveFile:
public static boolean moveFile(String sourcePath, String targetPath) throws IOException {
boolean fileMoved = true;
// if i pass sourcePath i lost the last /
File dirFiles = new File("/Users/myname/Documents/deleghe/remote/F24_CT/deleghe_da_inviare_a_icbpi/");
File[] listOfFiles = dirFiles.listFiles();
String dest = "/Users/myname/Documents/deleghe/remote/F24_CT/deleghe_inviate/";
for (File file : listOfFiles) {
if (file.length() >= 968 && file.length() <= 2057) {
Files.move(file.toPath(),
(new File(dest + file.getName())).toPath(),
StandardCopyOption.REPLACE_EXISTING);
System.out.println("File spostato correttamente: " + file.getName() + "!! \n");
} else {
System.out.println("Non รจ stato possibile spostare il file: " + file.getName() + "!! \n");
}
}
return fileMoved;
}
try to use Paths.get(dest, file.getName()).toUri() instead of dest + file.getName() (it is not best practice)
you are not losing anything, you just reading files from directory and your code is working without any exception. Check your directories and files inside them one more time

Unable to copy "My Documents" in Java

I am trying to copy files, folders, sub folders, zip files etc from a given location to another location. I used the code below.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
public class CopyDirectoryExample
{
public static void main(String[] args)
{
File srcFolder = new File("C:\\Users\\Yohan\\Documents");
File destFolder = new File("D:\\Test");
//make sure source exists
if(!srcFolder.exists()){
System.out.println("Directory does not exist.");
//just exit
System.exit(0);
}else{
try{
copyFolder(srcFolder,destFolder);
}catch(IOException e){
e.printStackTrace();
//error, just exit
System.exit(0);
}
}
System.out.println("Done");
}
public static void copyFolder(File src, File dest)
throws IOException{
if(src.isDirectory()){
//if directory not exists, create it
if(!dest.exists()){
dest.mkdir();
System.out.println("Directory copied from "
+ src + " to " + dest);
}
//list all the directory contents
String files[] = src.list();
for (String file : files) {
//construct the src and dest file structure
File srcFile = new File(src, file);
File destFile = new File(dest, file);
//recursive copy
copyFolder(srcFile,destFile);
}
}else{
//if file, then copy it
//Use bytes stream to support all file types
InputStream in = new FileInputStream(src);
OutputStream out = new FileOutputStream(dest);
byte[] buffer = new byte[1024];
int length;
//copy the file content in bytes
while ((length = in.read(buffer)) > 0){
out.write(buffer, 0, length);
}
in.close();
out.close();
System.out.println("File copied from " + src + " to " + dest);
}
}
}
Now, I used the above code to take a copy of "My Documents". But unfortunatly, it ended up with NullPointerException after running for a while.
The reason for the error is it tried to take a copy of "My Music" folder, which is not even inside of the "My Documents" folder. I tested this code in 2 different machines running windows 7, got the same error in both.
A windows specific solution is fine for me, as I am targeting windows machines at the moment. What have I done wrong?
The error I am getting is below
Directory copied from C:\Users\Yohan\Documents\My Music to D:\Test\My Music
Exception in thread "main" java.lang.NullPointerException
at CopyDirectoryExample.copyFolder(CopyDirectoryExample.java:51)
at CopyDirectoryExample.copyFolder(CopyDirectoryExample.java:56)
at CopyDirectoryExample.main(CopyDirectoryExample.java:25)
The reason this isn't working is because "My Music", "My Pictures" (or Images) and other directories are just symbolic links. See this post on how to detect symbolic links: Java 1.6 - determine symbolic links
Unfortunately, these folders (Images, Music, Videos) are NOT considered symbolic links in Java. Using Java 8,
Files.isSymbolicLink(srcFile.toPath())
While return false, and Files.readSymbolicLink(srcFile.toPath()) will fail with an Access Denied Exception.
So you can't process them automatically. Fix your code so that you handle properly the case where srcFile.isDirectory() returns true, but srcFile.listFiles() return null.
On my Windows 8 machine, three folders were in that case. I'm on a French machine, so I got a "Ma Musique" folder that gave null for listFiles. However,
new File("C:\\Users\\<user>\\Music").listFiles()
Does NOT return null. So I'm afraid you'll have to hardcode special code for the three folders (Music, Videos, Images) if you want to copy the data too.
You are not handling the empty directories -- try making the following change,
It will work after making the below change.
//list all the directory contents
String files[] = src.list();
if (files!=null && files.length>0) {
for (String file : files) {
//construct the src and dest file structure
File srcFile = new File(src, file);
File destFile = new File(dest, file);
//recursive copy
copyFolder(srcFile,destFile);
}
}

Decompress folder without overwriting new files

I want to decompress a large folder in ZIP format with nested subdirectories in a directory that already exists. The files inside the ZIP folder can exist in the decompressed directory. I need to keep the previous files only when the date of that file is newer than the date of the file in the ZIP folder. If the file in the ZIP is newer, then I want to overwrite it.
There is some good strategy for doing this? I already checked truezip and zip4j, but I can't find the option (the best option for me so far is modifying the zip4j sources, but it should be a better way.
P.S. If I haven't explained this correctly, please feel free to ask. English is not my native language and I could have expressed anything wrong..
Thanks.
With Zip4j, this is how it can be done:
import java.io.File;
import java.util.Date;
import java.util.List;
import net.lingala.zip4j.core.ZipFile;
import net.lingala.zip4j.model.FileHeader;
import net.lingala.zip4j.util.Zip4jUtil;
public class ExtractWithoutOverwriting {
public static void main(String[] args) {
try {
String outputPath = "yourOutputPath";
ZipFile zipFile = new ZipFile(new File("yourZipFile.zip"));
if (zipFile.isEncrypted()) {
zipFile.setPassword("yourPassword".toCharArray());
}
#SuppressWarnings("unchecked")
List<FileHeader> fileHeaders = zipFile.getFileHeaders();
for (FileHeader fileHeader : fileHeaders) {
if (fileHeader.isDirectory()) {
File file = new File(outputPath + System.getProperty("file.separator") + fileHeader.getFileName());
file.mkdirs();
} else {
if (canWrite(outputPath, fileHeader)) {
System.out.println("Writing file: " + fileHeader.getFileName());
zipFile.extractFile(fileHeader, outputPath);
} else {
System.out.println("Not writing file: " + fileHeader.getFileName());
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private static boolean canWrite(String outputPath, FileHeader fileHeader) {
File file = new File(outputPath + System.getProperty("file.separator") + fileHeader.getFileName());
//time stamps are stored in dos format in a zip file
//convert it to java format
long lastModifiedFromZip = Zip4jUtil.dosToJavaTme(fileHeader.getLastModFileTime());
//If the file exists, it can be overwritten only if the file in the destination path
//is newer than the one in the zip file
return !(file.exists() && isLastModifiedDateFromFileNewer(file.lastModified(), lastModifiedFromZip));
}
public static boolean isLastModifiedDateFromFileNewer(long lastModifiedFromFile, long lastModifiedFromZip) {
Date lastModifiedDateFromFile = new Date(lastModifiedFromFile);
Date lastModifiedDateFromZip = new Date(lastModifiedFromZip);
return lastModifiedDateFromFile.after(lastModifiedDateFromZip);
}
}
What we do here is:
Create a new instance of the ZipFile
If the zip file is encrypted, set a password
Loop over all files in the zip file
Check if a file with this name exists in the output path and if this file's last modification time is "newer" than the one in the zip file. This check is done in the method: canWrite()
This code is not completely tested, but I hope it gives you an idea of a solution.

List of files inside a zip folder and its subfolder

I am looking a way to get the list of files inside a zip file. I created a method to get the list of files inside a directory but I am also looking a way to get files inside a zip as well instead of showing just zip file.
here is my method:
public ArrayList<String> listFiles(File f, String min, String max) {
try {
// parse input strings into date format
Date minDate = sdf.parse(min);
Date maxDate = sdf.parse(max);
//
File[] list = f.listFiles();
for (File file : list) {
double bytes = file.length();
double kilobytes = (bytes / 1024);
if (file.isFile()) {
String fileDateString = sdf.format(file.lastModified());
Date fileDate = sdf.parse(fileDateString);
if (fileDate.after(minDate) && fileDate.before(maxDate)) {
lss.add("'" + file.getAbsolutePath() +
"'" + " Size KB:" + kilobytes + " Last Modified: " +
sdf.format(file.lastModified()));
}
} else if (file.isDirectory()) {
listFiles(file.getAbsoluteFile(), min, max);
}
}
} catch (Exception e) {
e.getMessage();
}
return lss;
}
After having searched for a better answer for a while, I finally found a better way to do this. You can actually do the same thing in a more generic way using the Java NIO API (Since Java 7).
// this is the URI of the Zip file itself
URI zipUri = ...;
FileSystem zipFs = FileSystems.newFileSystem(zipUri, Collections.emptyMap());
// The path within the zip file you want to start from
Path root = zipFs.getPath("/");
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path path, BasicFileAttributes attrs) throws IOException {
// You can do anything you want with the path here
System.out.println(path);
// the BasicFileAttributes object has lots of useful meta data
// like file size, last modified date, etc...
return FileVisitResult.CONTINUE;
}
// The FileVisitor interface has more methods that
// are useful for handling directories.
});
This approach has the advantage that you can travers ANY file system this way: your normal windows or Unix filesystem, the file system contain contained within a zip or a jar, or any other really.
You can then trivially read the contents of any Path via the Files class, using methods like Files.copy(), File.readAllLines(), File.readAllBytes(), etc...
You can use ZipFile.entries() method to read the list of files via iteration as below:
File[] fList = directory.listFiles();
for (File file : fList)
{
ZipFile myZipFile = new ZipFile(fList.getName());
Enumeration zipEntries = myZipFile.entries();
while (zipEntries.hasMoreElements())
{
System.out.println(((ZipEntry) zipEntries.nextElement()).getName());
// you can do what ever you want on each zip file
}
}

check if the file is of a certain type

I want to validate if all the files in a directory are of a certain type. What I did so far is.
private static final String[] IMAGE_EXTS = { "jpg", "jpeg" };
private void validateFolderPath(String folderPath, final String[] ext) {
File dir = new File(folderPath);
int totalFiles = dir.listFiles().length;
// Filter the files with JPEG or JPG extensions.
File[] matchingFiles = dir.listFiles(new FileFilter() {
public boolean accept(File pathname) {
return pathname.getName().endsWith(ext[0])
|| pathname.getName().endsWith(ext[1]);
}
});
// Check if all the files have JPEG or JPG extensions
// Terminate if validation fails.
if (matchingFiles.length != totalFiles) {
System.out.println("All the tiles should be of type " + ext[0]
+ " or " + ext[1]);
System.exit(0);
} else {
return;
}
}
This works fine if the file name have an extension like {file.jpeg, file.jpg}
This fails if the files have no extensions {file1 file2}.
When I do the following in my terminal I get:
$ file folder/file1
folder/file1: JPEG image data, JFIF standard 1.01
Update 1:
I tried to get the magic numbers of the file to check if it is JPEG:
for (int i = 0; i < totalFiles; i++) {
DataInputStream input = new DataInputStream(
new BufferedInputStream(new FileInputStream(
dir.listFiles()[i])));
if (input.readInt() == 0xffd8ffe0) {
isJPEGFlag = true;
} else {
isJPEGFlag = false;
try {
input.close();
} catch (IOException ignore) {
}
System.out.println("File not JPEG");
System.exit(0);
}
}
I ran into another problem. There are some .DS_Store files in my folder.
Any idea how to ignore them ?
Firstly, file extensions are not mandatory, a file without extension could very well be a valid JPEG file.
Check the RFC for JPEG format, the file formats generally start with some fixed sequence of bytes to identify the format of the file. This is definitely not straight forward, but I am not sure if there is a better way.
In a nutshell you have to open each file, read first n bytes depending on file format, check if they match to file format you expect. If they do, its a valid JPEG file even if it has an exe extension or even if it does not have any extension.
For JPEGs you can do the magic number check in header of the file:
static bool HasJpegHeader(string filename)
{
using (BinaryReader br = new BinaryReader(File.Open(filename, FileMode.Open)))
{
UInt16 soi = br.ReadUInt16();
UInt16 jfif = br.ReadUInt16();
return soi == 0xd8ff && jfif == 0xe0ff;
}
}
More complete method here which covers EXIFF as well: C# How can I test a file is a jpeg?
One good (though expensive) check for validity as an image understood by J2SE is to try to ImageIO.read(File) it. That methods throws some quite helpful exceptions if it does not find an image in the file provided.

Categories

Resources