I have a application that reads a PDF, transform the content to collection of TIF files, and send them to Glass Fish Server for saving.
Usually there are 1-5 pages and it works nice, but when I got a input file with 100+ pages...
it throws error on the transfer.
Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
Putting more resources is not a good option in my case, so I m looking for a way to optimize it somehow.
I store the data in:
HashMap<TifProfile, List<byte[]>
Is there a better way to store or send them ?
EDIT
I did some tests and the final collections for PDF with 80 pages
has size over 280mb (240 tiffs with different settings inside)
Well you don't give us much to go on, but it seems clear to me that storing 100+ high resolution TIFF-encoded images in memory will very quickly exhaust any resources you have available.
It might be better to load it into the database in batches, e.g. just handle 5 pages at a time.
Alternatively depending on your JDBC driver, you might be able to stream the image data into a JDBC BLOB so you wont have to cache it in memory. Heres some food for thought...
http://artofsystems.blogspot.co.uk/2008/07/mysql-postgresql-and-blob-streaming.html
Related
We are emulating a p2p network in java. So we divide the file into chunks (with checksums) so that the individual chunks can be recompiled into the original file once we have all the parts. What is the best way to store the individual parts while they are being downloaded?
I was thinking of just storing each chunk as a separate file...but if there are 20000 chunks, it would create as many files. is this the best way?
Thanks
Either keep chunks in memory or in files. There is no much to discuss here about. Findd the perfect ratio between chunks count and the actual size of it, to suit your needs.
Files sounds more reasonable as data would not be totally lost in case of application crash and continue of download would be possible.
I would write to memory until you reach some threshold, at which point you dump your memory to disk, and keep reading into memory. When the file transfer completes, you can take what is currently stored in memory, and concatenate it with what may have been stored on disk.
My website allows users to upload profile images, which are then represented in several (about 3 or 4) different sizes around the site.
As the site grows, there is always the possibility that the image sizes will need to be tweaked, or new image sizes will be needed later.
How does a site like Facebook or Twitter handle this? Do they process the image into different sizes right at the upload time, or do they store higher quality images and process them server-side when needed?
Is there a common way to handle this?
The different sized images would most likely be cached somewhere, rather than processed at access-time each time. When the upload occurs, you would create all the sizes you will need and store them (in files or your database). This method uses the most disk space to store all image sizes, but places all the processing load at the moment of upload, allowing for faster access later.
Alternatively, if the load isn't expected to be heavy you could create different sizes at the first moment those sizes are accessed, and then store them for future use. This method therefore uses less disk space by only creating images that are actually used, but would inhibit access times the first time an image size is used. Future access times would be fast, accessing a cached image.
Addendum for larger loads
Consider performing the image processing on a separate worker server. Ideally, the front end and the worker servers would share the storage mount to which images are uploaded and stored, saving transfer bandwidth between them. At the moment of upload of the original, the main application places the image in a queue for processing by the worker. The images cannot be available for use until processed, but the processing load remains independent of the front end, so it does not have much direct impact on the end-user experience.
Depending on just how many uploads you expect per minute, the worker process could be as simple as a cron job running every minute to poll a table of pending upload tasks (registered by the main application), perform the conversions, and update the table when they have been completed. If one minute is too long to wait however, you would need a continuous running worker process to be polling for new tasks. Obviously this is more complex to implement.
No matter what you do though, do not regenerate the alternative image sizes each time you need them. Store them somewhere.
They process the image into different sizes right at the upload time.
Upload happens once, viewing could happen thousands of times. Processing on upload reduces the work for the server.
If you want to get better result:
Upload a best needed resolution of image on server(from client side) and the after scale that image in the lower required size of image and save those images byte array in database, like original image, thumbnail of original image and other sizes.
Send only that image to client which he really want to show, like send thumbnail for profile pic and gallery view and original image on click on any thumbnail. Keep image URL some how similar for thumbnail and original.
My suggestions:
Put limits on file size, resolution and format of the upload and store the original image.
Alternatively: Impose less strict limits, convert the original image, store the converted image instead of the original.
Thumbnails are (presumably) accessed the most often, and are (presumably) the smallest, so it makes sense to store them. To store an image that's say a 1/5th (or less) of the size of the original is hardly significant. These can be stored upon upload.
If you don't want to store all the other-sized images, some sort of Least-Recently-Used or Most-Often-Used cache may be in order. These should be reasonably simple to implement with image unique IDs, a queue of IDs for each size and a directory that stores the cached images with file-names generated from the unique ID and desired image size.
If the sum of all the other image sizes is much less than the original image size (say less than half), it wouldn't be the worst idea to just store all of them.
I'm sure finding an open-source Java image resizer wouldn't be much effort.
ImageIO is a decent, but reasonably basic, image processing API. Not sure if it can do resizing. I see it has some caching functionality.
I came across FileBackedOutputStream class from Google Guava library and was wondering if it's suitable to be used as a kind of a buffer: once every day, a process in my webapp generates tens of thousands of lines (each containing about 100characters) which are then uploaded to a file on an FTP server. I was thinking of using a FileBackedOutputStream object to first write all these strings to and then give access to them to my FTP client by using FileBackedOutputStream.getSupplier().getInput(), which returns an InputStream. Would this be a correct use case for FileBackedOutputStream?
Yes, I think that would be an acceptable use case for FileBackedOutputStream. However, I think FileBackedOutputStream is best when you're using it with data that may vary in size considerably... for small amounts of data that can fit in memory without a problem you want to just buffer them in memory but for large amounts of data that might give you an OutOfMemoryError if you try to read it all in to memory, you want to switch to buffering to a file. This is where FileBackedOutputStream really shines I think. I've used it for buffering uploaded files that I need to do several things with.
Happy Holidays everyone! I know peoples' brain are switched off right now, but can you please give me solutions or best practices to cache images in java or j2me. I am trying to load images from a server (input stream) and I want to be able to store them so I don't have to retrieve them from the server all the time if they need to be displayed again. Thank you.
The approach you'll want probably depends on the number of images as well as their typical file size. For instance, if you're only likely to use a small number of images or small-sized images, the example provided by trashgod makes a lot of sense.
If you're going to be downloading a very large number of images, or images with very large file sizes, you may consider caching the images to disk first. Then, your application could load and later dispose the images as needed to minimize memory usage. This is the kind of approach used by web browsers.
The simplest approach is to use whatever collection is appropriate to your application and funnel all image access though a method that checks the cache first. In this example, all access is via an image's index, so getImage() manipulates a cache of type List<ImageIcon>. A Map<String, ImageIcon> would be a straightforward alternative.
The only way to do this in J2ME is to save the images' raw byte array (i.e. that you pass to Image.createImage()) to somewhere persistent, possibly a file using JSR75 but more likely a record store using RMS.
HTH
I have a file of size 2GB which has student records in it. I need to find students based on certain attributes in each record and create a new file with results. The order of the filtered students should be same as in the original file. What's the efficient & fastest way of doing this using Java IO API and threads without having memory issues? The maxheap size for JVM is set to 512MB.
What kind of file? Text-based, like CSV?
The easiest way would be to do something like grep does: Read the file line by line, parse the line, check your filter criterion, if matched, output a result line, then go to the next line, until the file is done. This is very memory efficient, as you only have the current line (or a buffer a little larger) loaded at the same time. Your process needs to read through the whole file just once.
I do not think multiple threads are going to help much. It would make things much more complicated, and since the process seems to be I/O bound anyway, trying to read the same file with multiple threads probably does not improve throughput.
If you find that you need to do this often, and going through the file each time is too slow, you need to build some kind of index. The easiest way to do that would be to import the file into a DB (can be an embedded DB like SQLite or HSQL) first.
I wouldn't overcomplicate this until you find that the boringly simple way doesn't work for what you need. Essentially you just need to:
open input stream to 2GB file, remembering to buffer (e.g. by wrapping with BufferedInputStream)
open output stream to filtered file you're going to create
read first record from input stream, look at whatever attribute to decide if you "need" it; if you do, write it to output file
repeat for remaining records
On one of my test systems with extremely modest hardware, BufferedInputStream around a FileInputStream out of the box read about 500 MB in 25 seconds, i.e. probably under 2 minutes to process your 2GB file, and the default buffer size is basically as good as it gets (see the BufferedInputStream timings I made for more details). I imagine with state of the art hardware it's quite possible the time would be halved.
Whether you need to go to a lot of effort to reduce the 2/3 minutes or just go for a wee while you're waiting for it to run is a decision that you'll have to make depending on your requirements. I think the database option won't buy you much unless you need to do a lot of different processing runs on the same set of data (and there are other solutions to this that don't automatically mean database).
2GB for a file is huge, you SHOULD go for a db.
If you really want to use Java I/O API, then try out this: Handling large data files efficiently with Java and this: Tuning Java I/O Performance
I think you should use memory mapped files.This will help you to map the bigger file to a
smaller memory.This will act like virtual memory and as far as performance is concerned mapped files are the faster than stream write/read.