Just wondering, if it is generally a good idea to compress jar files that will be shipped with a desktop application (no network access to jars), of if the decompression will have a bigger impact than file io.
EDIT: Thanks for the answers so far, and sorry for being a bit unclear here. I was not speaking about shipping the jars to the customer, but of the optimal format for the jar files on the disk when the app start ups. I know that jar files are zip files and can be served with different compression levels (or no compression at all), and I was directly wondering how compression would alter startup performance, not only on my dev box (has a fast SSD disk in it, but also on slower disks).
I expect that the answer depends on your application. However, it should be easy to determine experimentally if compressed JARs give faster or slower startup for your application. Just build your application JAR file with compression on and compression off, and compare the application startup times. (Try it on different machines; e.g. with slow discs, fast discs, SSD, and with different amounts of RAM. Bear in mind that some OSes cache files aggressively, and take this into account in your timing measurements.)
While you are at it, you should also investigate the impact of different compression levels (via the jar command options) and using pack200
Having said that, my gut feeling is that the difference between compressed and uncompressed for locally installed JARs will be small enough that the user will hardly notice the difference.
In almost any reasonable desktop situation, the cost of disk IO is way higher than the cost of compression. It'll almost certainly be a win to compress files.
That said, a JAR file is already compressed. Doubly compressing things is generally not worth the effort. So I'd say no, don't compress your JAR files as they are already compressed.
Related
I have an issue with fragmentation on my drive. I got a programm that generates over 50000 files in different folders, each file grows over time. Each file will be about 500MB in size and I need to read the files fast.
The issue I am facing is that each file will be spread over the drive and defragmenation would take over 4 weeks.
I heard about a filesystem that will spread each file on the drive so that the gap between each file is the same sice. I searched the internet for that filesystem but i couldn't find anything.
My program is written in Java, maybe there is a way to set the beginning of a file on a specific byte position on the drive.
I would be glad if someone could help me facing this issue.
I heard about a filesystem that will spread each file on the drive so that the gap between each file will be the same sice. I searched in the internet for that filesystem but i coudn't find anything.
Most likely you did not because it does not exist...
But we have RAID systems (Rapid Array of Inexpensive Disks) which could ease your pain...
As Timothy said, you can't get to that level by using Java.
I neither heard that filesystem, it hasn't got much logic though.
Perhaps, in the case that you are storing text, you can use a NoSQL database (like MongoDB) that stores data in binary size. Probably you'll get good speeds, and the Java connector is easy to use.
Use a Linux filesystem like ext4 where disk fragmentation is very low but also make sure you have plenty of disk space left else fragmentation will happen anyway.
I also don't know of a file system that does this. However I have some info that may help-
If you used an SSD, then fragmentation would be less of a concern for reading performance reasons. SSDs store data in chunks - NAND flash pages, 16 KB for instance. These are always stored in scattered order due to the wear-levelling algorithm used. That is very unlike how hard disks work in practice. Pages on SSDs are accessed in a very parallel fashion as well. As a result, you would have much less impact of fragmentation on reading performance with an SSD. Fragmentation would still have some penalty for writes/deletions.
RAID would also allow for higher performance on reads as Timothy mentions.
I have the following requirement:
I need to unpack a zip or tar.gz file. The target platform where this code will run is an AEM 6.1 environment. The system had some performance issues in the past. Especially memory usage was much to high. Therefore I have to save memory. The zip/tar.gz file contains some text-Files, SVG, PNG and EPS files as well as more files I don't need. The archive file will be downloaded and available as an ByteArrayInputStream.
I did some research and tried to figure out, which is the best way to do that. Apache commons provides libraries to unpack archives as well as the JDK. But I could not figure out, what implementation uses less memory.
I think, it would be the best, if I could open the archive while it is still compressed, and read and unpack the containing files separately. So I would just have the compressed archive and the one of its containing file in in memory.
But I'm not sure, which implementation provides this possibility or if there is a better way to do that.
Has somebody a good advice?
Thank you and best regards.
ZipInputStream from the JDK does just what you need: https://docs.oracle.com/javase/8/docs/api/java/util/zip/ZipInputStream.html
You can find the entry you need via getNextEntry().getName(), and read the bytes just for that entry. The ZipInputStream.read method allows you to implement buffered read, so you can easily limit the memory consumption if you don't need the whole decompressed entry in memory (i.e. if you write the entry into output as you read it).
In this case you can minimize the footprint of you application as well, since you won't need any extra libraries.
Our system is having a problem with too much files, which is used in a webapp which should be using all the time. That mean the files cannot be deleted and there are too much of them, making the system(which is a windows) slow. We would like zip up the files, and when the file is request, we unzip the particular file out.
I've try the java ZipFile class, and the performance is not good enough, because there will be many people using the webapp and they will request the files. From my observation, the unzipping action require time between 0.5 secs to 2 secs, and when there are too much user, the system cannot catch up to them.
For example, I've use a Jmeter to simulate a situation where 30 user use the system, with a random delay between 0.3 secs to 0.6 secs. Although I doubt there may not be so much requests, I cannot know for advance that how many people will use the webapps. I would like to ask you guys, is there any other method to solve this problem?
Thanks in advance!!
P.S. If any 3rd party library is need, it must be free!
P.S. Because the number of files is just too much, and it hang the machine. We would like do this : zip up 2000 file into a zip file, then the number of files will decrease and hope the system won't hang anymore, and when need, we unzip some file out.
Okay, here's some thoughts. It appears to me that your core problem is the slowness of your system and that you're trying to fix it by compressing the files and decompressing them on demand. Then you've found that the decompression is too slow and you need a faster way to do that.
Now I'm not entirely certain why you think this compression will speed things up instead of making things slower.
I would go back to the original problem and work more on solving that. Why is the number of files making your system slow? If you can figure that out, you can fix it in a way that doesn't involve things going even slower.
If it's an issue with too many files in a directory, think about splitting into multiple directories. But I have no idea whether NTFS even has that problem (FAT did). For example, if you have a directory with files for every minute of the last ten years (five million files), you can split them into day directories (three and a half thousand directories with fifteen hundred files in each).
Compression won't reduce the number of files, just the space taken by them.
If it's an issue with the number of files on the system (rather than in a directory), there are plenty of ways to split files between systems as well. Example, hive off 10% of the entire file set to ten different machines and forward incoming requests for a specific file to the relevant machine.
But, I have to say, I've seen Windows machines handle absolute bucket-loads of files so I'd be very surprised if the problem lay there. I think you're probably just going to have to track down what's actually causing your "hangs".
compressing/uncompressing the files will not make the windows faster.
If zip doesn't provides performance gain (despite has native implementation in Java), you can try to improve at the filesystem-level. Folders with too many (>10000) files doesn't work well under some Windows filesystems, so try to divide the files into several folders, tune the NTFS filesystem (cluster size, reserved space for filesystem), disable anti virus, disable indexing, buy an SSD SLC hard disk...
we've bean struggling here at work by somebody suggestion that we should decrease the size of our war file, specifically the WEB-INF/lib directory size, in order to improve our production JBoss instance performance. Something I'm still suspicious about.
We have around 15 web apps deploy in our application server, each about 15 to 20 MB in size.
I know there are a lot of variables involved on this, but has anyone of you actually deal with this situation? Does the .war files size actually has a significant impact on web containers in general?
What advice can you offer?
Thank U.
There are many things to be suspicious of here:
What about the application is not performing to the level you would like?
Have you measured the application to find out which components are causing the lack of performance?
What are the bottlenecks in the application/system?
The size of the application alone has nothing to do with any sort of runtime performance. The number of classes loaded during the lifetime of the application has an impact on memory usage of the application, but an incredibly negligible one.
When dealing with "performance issues", the solution always follows the same general steps:
What does it mean when we say "bad performance"?
What specifically is not performing? Measure, measure, measure.
Can we improve the specific component not performing to the level we want?
If so, implement the ideas, measure again to find out if performance has truly improved.
Need you to tell us the operating system.
Do you have antivirus live protection?
A war/jar file is actually a zip file - i.e., if you renamed a .war to a .zip, you can use a zip utility to view/unzip it.
During deployment, the war file is unzipped once into a designated folder. If you have live-protection, the antivirus utility might take some time to scan the new branch of directories created and slow down any access to them.
Many web app frameworks, like JSPs, create temporary files and your live-protection would get into action to scan them.
If this is your situation, you have to decide whether you wish to exclude your web-app from antivirus live-scanning.
Are you running Linux but your web directory is accessed using ntfs-3g? If so, check if the ntfs directory is compressed. ntfs-3g has problems accessing compressed ntfs files especially when multiple files are manipulated/created/uncompressed simultaneously. In the first place, unless there are some extremely valid reasons (which I can't see any), a web app directory should be a local partition in a format native to Linux.
Use wireshark to monitor the network activity. Find out if web apps are causing accesses to remote file systems. See if there are too many retransmits whenever the web apps are active. Excessive retransmits or requests for retransmits means the network pipeline has integrity problems. I am still trying to understand this issue myself - some network cards have buffering problems (as though buffer overflow) operating in Linux but not in Windows.
Wireshark is not difficult to use as long as you have an understanding of ip addresses, and you might wish to write awk, perl or python scripts to analyze the traffic. Personally, I would use SAS.
I have a command-line executable which I need to run from Java on Windows XP. It uses files as input and output. But I want to avoid the overhead of file IO, so I thought of an in-memory RAM file system.
NetBSD has mount_mfs.
Could you recommend the most convenient way of doing this?
You should also consider whether you really need this (premature optimization, yadda, yadda). On all modern operating systems, filesystem I/O is cached anyway, so frequently-used files are essentially as fast as a RAM disk.
Related question (with many good answers):
RAM drive for compiling - is there such a thing?
Commons VFS provides handy interfaces to virtual filesystems, inclunding in-memory file system.