I am trying to parse a pcap file two different ways by using two different methods. The pcap file is passed to the class when it is created that contains both methods. When i use the pcap file in the first method no problem looping. However, when i go to parse through it a second time in the second method nothing happens when i try to print each packet. I tried passing the pcap file directly to the second method and still no dice. Do I need to reset a counter/pointer? Any ideas?
How pcap file is loaded from disk
pcap = Pcap.openStream(pcapPath);
How class constructor intakes pcap file
public PcapParsing(Pcap pcap) {
this.pcap = pcap;
}
How both methods parse the pcap file
public void arpFloodDetect(Pcap ppcap)
{
try {
ppcap.loop((final Packet packet) -> {
System.out.println(ppcap.toString());
return true;
});
} catch (IOException e) {
e.printStackTrace();
}
}
Do I need to reset a counter/pointer?
You need to create a new Pcap by calling Pcap.openStream again. The Pcap API does not expose any methods for resetting the underlying stream.
Pcap files can get large like a couple gigs or larger. Will this add a significant load penalty for each time i call it?
It depends on how good your file system is. If we assume that your file system is on a fast local SSD, and you are running an OS which uses RAM for file system buffer caching, then the reading a big file will be fast the first time, and faster the second time.
It also depends on what you mean by "significant", and what is acceptable. And how much money you are prepared to pay to upgrade your hardware to achieve acceptable performance.
Would you happen to know a different way of loading files that avoids a penalty if there is one?
Basically, no.
The only other alternatives I can think of involve read or mapping the entire file into the JVM's address space and then wrapping it in an InputStream. You still need to create a Pcap for each pass through the file.
But the problem with this is that it requires as much JVM address space as the size of the file you are processing. If the file is significantly bigger than the amount of physical RAM available, it can get horrible:
In the best case your performance will be equivalent to re-reading the file from disk.
In the worst case your application thrashes and brings the operating system to its knees (or gets OOM-killed to prevent that).
The current Pcap implementation is designed to avoid that by not caching the data in RAM. That is how it is able to cope with huge input files without running out of memory, etc.
I have created a program that loads image with FileDialog, resize it, previews it to users, and after button click saves it to a folder.
My problem is:
when I run my program - RAM usage ~50mb
loading 1mb JPG file - RAM usage ~93mb
saving 1mb JPG file - RAM usage ~160mb
I intend this program to be lightweight, but after 3-4 files it occupies 500mb RAM space.
I tried to use System.gc(); every time user saves file, but it reduced RAM usage only by ~10%.
Below is a code with loading and saving images, full code, you can find HERE.
BTW - why when after loading 1mb JPG and then saving it size increase to 10mb?
Code for loading image:
FileDialog imageFinder = new FileDialog((Frame)null, "Select file to open:");
imageFinder.setFile("*.jpg; *.png; *.gif; *.jpeg");
imageFinder.setMode(FileDialog.LOAD);
imageFinder.setVisible(true);
userImagePath = new File(imageFinder.getDirectory()).getAbsolutePath()+"\\"+imageFinder.getFile();
userImagePath = userImagePath.replace("\\", "/");
Code for saving image:
BufferedImage bimage = new BufferedImage(userImage.getWidth(null), userImage.getHeight(null), BufferedImage.TYPE_INT_ARGB);
Graphics2D bGr = bimage.createGraphics();
bGr.drawImage(userImage, 0, 0, null);
bGr.dispose();
try {
BufferedImage bi = bimage;
File outputfile = new File("C:\\Users\\Mariola\\git\\MySQL-viwer\\MySQL viewer\\src\\database_images\\"+userBreedInfo[0]+".jpg");
ImageIO.write(bi, "png", outputfile);
} catch (IOException e1) {
}
}
System.gc()
The "problem" is that ImageIO kind of uses much memory. Then this memory will not be returned to the OS (that's why even a no-need call to System.gc() will not return it) because that's how JVM works.(Java 13 promises that memory will be returned to the OS?) As #Andrew Thompson pointed out in the comment section, if you want less memory consumption take a look at this question. If you run it you will see that with memory limit it will not consume so much. Which actually tells you not to worry about it. JVM will do its magic and will handle memory consumption according to how much memory the OS says its free.
If it still bothers you, you could try to find any ImageIO alternatives that may behave differently. In my opinion though, it does not worth it for your needs. I mean, you just want to save/load an image.
Another worth-to-read question is Why is it bad practice to call System.gc()?
I am building a java server that needs to scale. One of the servlets will be serving images stored in Amazon S3.
Recently under load, I ran out of memory in my VM and it was after I added the code to serve the images so I'm pretty sure that streaming larger servlet responses is causing my troubles.
My question is : is there any best practice in how to code a java servlet to stream a large (>200k) response back to a browser when read from a database or other cloud storage?
I've considered writing the file to a local temp drive and then spawning another thread to handle the streaming so that the tomcat servlet thread can be re-used. This seems like it would be io heavy.
Any thoughts would be appreciated. Thanks.
When possible, you should not store the entire contents of a file to be served in memory. Instead, aquire an InputStream for the data, and copy the data to the Servlet OutputStream in pieces. For example:
ServletOutputStream out = response.getOutputStream();
InputStream in = [ code to get source input stream ];
String mimeType = [ code to get mimetype of data to be served ];
byte[] bytes = new byte[FILEBUFFERSIZE];
int bytesRead;
response.setContentType(mimeType);
while ((bytesRead = in.read(bytes)) != -1) {
out.write(bytes, 0, bytesRead);
}
// do the following in a finally block:
in.close();
out.close();
I do agree with toby, you should instead "point them to the S3 url."
As for the OOM exception, are you sure it has to do with serving the image data? Let's say your JVM has 256MB of "extra" memory to use for serving image data. With Google's help, "256MB / 200KB" = 1310. For 2GB "extra" memory (these days a very reasonable amount) over 10,000 simultaneous clients could be supported. Even so, 1300 simultaneous clients is a pretty large number. Is this the type of load you experienced? If not, you may need to look elsewhere for the cause of the OOM exception.
Edit - Regarding:
In this use case the images can contain sensitive data...
When I read through the S3 documentation a few weeks ago, I noticed that you can generate time-expiring keys that can be attached to S3 URLs. So, you would not have to open up the files on S3 to the public. My understanding of the technique is:
Initial HTML page has download links to your webapp
User clicks on a download link
Your webapp generates an S3 URL that includes a key that expires in, lets say, 5 minutes.
Send an HTTP redirect to the client with the URL from step 3.
The user downloads the file from S3. This works even if the download takes more than 5 minutes - once a download starts it can continue through completion.
Why wouldn't you just point them to the S3 url? Taking an artifact from S3 and then streaming it through your own server to me defeats the purpose of using S3, which is to offload the bandwidth and processing of serving the images to Amazon.
I've seen a lot of code like john-vasilef's (currently accepted) answer, a tight while loop reading chunks from one stream and writing them to the other stream.
The argument I'd make is against needless code duplication, in favor of using Apache's IOUtils. If you are already using it elsewhere, or if another library or framework you're using is already depending on it, it's a single line that is known and well-tested.
In the following code, I'm streaming an object from Amazon S3 to the client in a servlet.
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.commons.io.IOUtils;
InputStream in = null;
OutputStream out = null;
try {
in = object.getObjectContent();
out = response.getOutputStream();
IOUtils.copy(in, out);
} finally {
IOUtils.closeQuietly(in);
IOUtils.closeQuietly(out);
}
6 lines of a well-defined pattern with proper stream closing seems pretty solid.
toby is right, you should be pointing straight to S3, if you can. If you cannot, the question is a little vague to give an accurate response:
How big is your java heap? How many streams are open concurrently when you run out of memory?
How big is your read write/bufer (8K is good)?
You are reading 8K from the stream, then writing 8k to the output, right? You are not trying to read the whole image from S3, buffer it in memory, then sending the whole thing at once?
If you use 8K buffers, you could have 1000 concurrent streams going in ~8Megs of heap space, so you are definitely doing something wrong....
BTW, I did not pick 8K out of thin air, it is the default size for socket buffers, send more data, say 1Meg, and you will be blocking on the tcp/ip stack holding a large amount of memory.
I agree strongly with both toby and John Vasileff--S3 is great for off loading large media objects if you can tolerate the associated issues. (An instance of own app does that for 10-1000MB FLVs and MP4s.) E.g.: No partial requests (byte range header), though. One has to handle that 'manually', occasional down time, etc..
If that is not an option, John's code looks good. I have found that a byte buffer of 2k FILEBUFFERSIZE is the most efficient in microbench marks. Another option might be a shared FileChannel. (FileChannels are thread-safe.)
That said, I'd also add that guessing at what caused an out of memory error is a classic optimization mistake. You would improve your chances of success by working with hard metrics.
Place -XX:+HeapDumpOnOutOfMemoryError into you JVM startup parameters, just in case
take use jmap on the running JVM (jmap -histo <pid>) under load
Analyize the metrics (jmap -histo out put, or have jhat look at your heap dump). It very well may be that your out of memory is coming from somewhere unexpected.
There are of course other tools out there, but jmap & jhat come with Java 5+ 'out of the box'
I've considered writing the file to a local temp drive and then spawning another thread to handle the streaming so that the tomcat servlet thread can be re-used. This seems like it would be io heavy.
Ah, I don't think you can't do that. And even if you could, it sounds dubious. The tomcat thread that is managing the connection needs to in control. If you are experiencing thread starvation then increase the number of available threads in ./conf/server.xml. Again, metrics are the way to detect this--don't just guess.
Question: Are you also running on EC2? What are your tomcat's JVM start up parameters?
You have to check two things:
Are you closing the stream? Very important
Maybe you're giving stream connections "for free". The stream is not large, but many many streams at the same time can steal all your memory. Create a pool so that you cannot have a certain number of streams running at the same time
In addition to what John suggested, you should repeatedly flush the output stream. Depending on your web container, it is possible that it caches parts or even all of your output and flushes it at-once (for example, to calculate the Content-Length header). That would burn quite a bit of memory.
If you can structure your files so that the static files are separate and in their own bucket, the fastest performance today can likely be achieved by using the Amazon S3 CDN, CloudFront.
I´m currently trying to figure out where my application has a memory leak. So, I wrote a small test program since my memory leaks seem to be related to the ImageIO.read() method. My test application consists of a simple JFrame with a JButton, which starts the following Action:
public void actionPerformed(ActionEvent e)
{
File folder = new File("C:\\Pictures");
ArrayList<File> files = new ArrayList<File>(Arrays.asList(folder.listFiles()));
try
{
for (File file : files)
ImageIO.read(file);
}
catch (Exception a)
{
a.printStackTrace();
}
}
Although I do NOT save the return value, that is the image, of ImageIO.read, my application has a huge memory allocation (~800 MB). For test reasons, the folder C:\Pictures contains ~23k pictures with a total size of 25GB.
Why does ImageIO.read() reserves that much memory, even after returning and NOT saving the image anywhere else?
It doesn't 'reserve that much memory'. It can't. All that seems to have happened here is that it took about 800 image loads before GC kicked in.
Why are you loading 23k images? It seems a strange thing to do. Where are you going to display them? Do you have an extra-large screen?
In my program I have loop that scans a bunch of files and reads their content. The problem happened over the iteration of about 1500 files and can't seem to be reproduced (or understood (by me))
The problem:
java.io.FileNotFoundException: /path/to/file//myFile (Too many open files)
Exception points to this method:
private static String readFileAsRawString(File f) throws IOException {
FileInputStream stream = new FileInputStream(f); // <------------Stacktrace
try{
FileChannel fc = stream.getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
return Charset.defaultCharset().decode(bb).toString();
} finally {
stream.close();
}
}
I ran this method over 20,000 files in QA and it seems to have no problems.
Do you see anything wrong with code i pasted above that would cause this issue?
The mapping is suspect. A MappedByteBuffer can outlive its FileChannel, and is valid until it is garbage collected. You might not have enough garbage to run the GC, but perhaps on a particular platform file handles are retained by unreferenced buffers.
Unless explicit garbage collection is disabled (-XX:-DisableExplicitGC), you should be able to test for this by catching the exception, calling System.gc(), and trying again. If it works on the second try, that's your problem. However, calling System.gc() as a permanent fix is a bad idea. The solution that will perform best overall will take some profiling on the target platform.
Don't use MappedByteBuffer for this trivial task. There is no well-defined time at which they are released. Just open the file, read it, close it.
I think you open too many files to fast, try to add a wait() to test this.
Then add a static counter that keeps tracks of opens files and if many files are already open, add a wait mechanism...