Reduce PDF file size in itext (java)

Reduce PDF file size in itext (java) - java

I'm creating a Web-based label printing system. For every label, there should be a unique s/n. So when a user decided to create 1000 labels (with the same data), all of it should have unique s/n, therefore the pdf will have 1000 pages, which increases the file size.
My problem is when the user decided to create more copies, the file size will get bigger.
Is there any way that I can reduce the file size of the pdf using Itext? Or is there any way that I can generated the pdf and output it in the browser without saving it neither to server/client's HDD?
Thanks for the help!

On approach is to compress the file. It should be highly compressible.
(I imagine that you should be able to generate the PDF on the server side without writing it to disc, though you could use a lot of memory / Java heap in the process. I don't think it is possible to deliver a PDF to the browser without the file going to the client PC's hard drive in some form.)

If everything except the s/n is the same for the thousands of labels, you only have to add the equal things one time as a template and put the s/n text on top of it.
Take a look at PDFTemplate in itext. If I recall correctly that creates and XObject for the recurring drawing/label/image.... and it is exactly the same object every time you use it.
Even with thousands of labels, the only thing that grows your document size is the s/n (and every page) but the graphics or text of the 'label' is only added once. That should reduce your file size.

Related

Jasper Reports cutting large String between pages

I don't know if "cutting" is the right term...
I've got to finish doing a large and complex report based on an Applet legacy system, a fellow and I decided trying reuse all the logic in the applet to avoid the complexity of doing a lot of sub-reports. What we did was copy all the logic in the applet that include a lot of condictionals/SQL and make a huge and properly formated String, so that in our Jasper file it would just have a method called "myVo.getBody()" besides the header and footer stuff.
Unfortunately we found out a problem that some part of text get lost between pages. I think that as the text get bigger and reach Jasper page limit for some reason it keeps being writed in a "no visible area" and when the next page content starts some part was lost.
For example, there is a list of 19 items and what happens is:
End of 2nd page
1 - item
2 - item
beggining of 3rd page
18th - item
19th - item
Items from 3 to 17 are not being showed.
Is there any Jasper configuration for this situation?
We tried:
Position type: Fix Relative to the Top and Float
Stretch Type: Relative to the Tallers Object and Relative to Band Height
Stretch With Overflot: true or false
I don't think showing Java code would be useful as it just use a StringBuffer to build the String, put it on body property in a PreparedDocumentVO so that Jasper model can consumes it. It seems to be some Jasper setting, or the idea of creating a huge String is not so good as we thought.

I would consider breaking the result up.
Jasper formats information based on a relative page size. This means that at some point in time, when dealing with information that is not likely to fit on a page, Jasper will probably make an assumption that doesn't hold (and your data will likely not be formatted into the page).
If you have an exceptionally long string, consider splitting it up. Besides, people scroll web pages down, not the side, so a heavy side-scrolling document is likely to cause user issues unless every record scrolls to the side just as heavily.

Hadoop: split a big image file using a custom input format

I am working on big geographic image files with a size greater than a hdfs block. I need to split the images into several strips (with an height of 100px for instance), then apply some processes on them and finally rebuild the final image.
To do so, I have created a custom input format (inherited from FileInputFormat) and a custom record reader. I am splitting the image in the input format, by defining several FileSplit (corresponding to one strip) which are read in the record reader.
I am not sure my splitting process is optimized because a strip can be accross 2 hdfs blocks, and I don't know how to "send" the split to the best worker (the one where there will be the minimal number of remote reading)
For the moment I am using FileInputFormat.getBlockIndex() with the split beginning offset in order to get the host of the split.
Do you have any advices to help me to solve this problem?
P.S. I am using the new Hadoop API

Image processing on hadoop using HIPI,
[Check this out, http://hipi.cs.virginia.edu/][1]

If it is realistic to process an entire image in a single mapper then you may find it simpler to achieve full data locality by making the block size of the image files larger than the size of each image, and get parallelism by processing multiple images at a time.

Android pdf writer APW high resolution images cause out of memory expection

I am using android pdf writer
(apw) in my app successfully for the most part. However, when I try to include a high resolution in a pdf document, I get an out of memory exception.
Immediately before creating the pdf file, the library must have the content itself converted into a string (representing the raw pdf content), which is then converted to a byte array. The byte array is written to the file in a file output stream (see example via website).
The out of memory expection occurs when the string is generated because representing all the pixels of a bitmap image in string format is very memory intensive. I could downsample the image using the android API, however, it is essential that the images are put into the pdf at high resolution (~2000 x 1000).
There are many scanner type apps which seem to be able to able generate pdf high res images, so there must be a way around it, surely. Granted, they may be using other libraries, but surely there is someone who has figured out a way around it with this library given that it is free and therefore popular(?)
I emailed the developer, but there was no response.
Potential solutions (I can think of) include:
Modifying the library to load a string representing e.g. the first 10% of the PDF, and writing to file chunk by chunk. (edit)
Modifying the library to output a stringoutput stream, or other output stream to a temp file (or final file) as the actual pdf content is being written in the pdfwriter object.
However as a relative java noob (and even more of a pdf specification noob), I am unable to understand the library well enough to do this myself.
Has anyone come across this problem and found a way around it? Anyone willing to hazard a suggestion, or take a look at the library itself even to see if there is a fix of some sort.
Thanks for your help.
nme32
Edit:
Logcat says heap size is in the range on 40 to 60mb before the crash. I understand (do correct me if not) that Android limits the available memory to apps depending on what else is running, though it is in the 50mb ballpark, depending on device.
When loading the image, I think APW essentially converts it to bitmap, that is represents the image pixel by pixel then puts it into string format, meaning it doesn't matter which image format you use, it may as well be bitmap.

First of all the resolution you are mentioning is very high. And i have already mentioned the issues related to Images in Android in this Answer
Secondly in case first solution doesn't work for you i would suggest Disk based LruCache.And store the chunks into that disk based cache and then retrieve and use it. Here is an Example of that.
Hope this would help. If it doesn't comment on this answer and i will add more solutions.

How to make the SIZE of a pdf field dynamic

I am writing to a pdf from a java servlet, and I was wondering if the size of the pdf fields are able to be resized based on the information that is dynamically received. Here is my pdf:
This is to verify that [first_middle_last_name] is entered into......
So when I originally made the pdf I surrounded that space with tons of spaces to be able to handle large names, but now when i get a small name it just looks bad. So is there anyway to counter this?
This doesnt seem like a programming question, but the data is written from a java servlet, and I figured there might be a way to do this in the code. Thanks in advance.

Swing Large Files Performance

We need to load and display large files (rich text) using swing, about 50mb. The problem is that the performance to render the files is incredibly poor. We tried both JTextPane and JEditorPane with no luck.
Does someone have experience with this and could give me some advise ?
thanks,

I don't have any experience in this but if you really need to load big files I suggest you do some kind of lazy loading with JTextPane/JEditorPane.
Define a limit that JTextPane/JEditorPane can handle well (like 500KB or 1MB). You'll only need to load a chunk of the file into the control with this size.
Start by loading the 1st partition of the file.
Then you need to interact with the scroll container and see if it has reached the end/beginning of the current chunk of the file. If so, show a nice waiting cursor and load the previous/next chunk to memory and into the text control.
The loading chunk is calculated from your current cursor position in the file (offset).
loading chunk = offset - limit/2 to offset + limit/2
The text on the JTextPane/JEditorPane must not change when loading chunks or else the user feels like is in another position of the file.
This is not a trivial solution but if you don't find any other 3rd party control to do this I would go this way.

You could use Memory Mapped File I/O to create a 'window' into the file and let the operating system handle the reading of the file.

Writing an efficient WYSIWYG text editor that can handle large documents is a pretty hard problem--Even Word has problems when you get into large books.
Swing is general purpose, but you have to build up a toolset around it involving managing documents separately and paging them.
You might look at Open Office, you can embed an OO document editor screen right into your app. I believe it's called OOBean...

JTextPane/JEditorPane do not handle well even 1mb of text (especially text with long lines).
You can try JEdit (StandaloneTextArea) - it is much faster than Swing text components, but I doubt it will handle this much text. I tried with 45m file, and while it was loaded (~25 seconds) and I could scroll down, I started getting "outofmemory" with 1700m heap.
In order to build a really scalable solution there are two obvious options really:
Use pagination. You can do just fine with standard Swing by displaying text in pages.
Build a custom text renderer. It can be as simple as a scrollable pane where only the visible part is drawn using BufferedReader to skip to the desired line in the file and read a limited number of lines to display. I did it before and it is a workable solution. If you need to have 'text selection' capabilities, this is a little more work, of course.
For really large files you could build an index file that contains offsets of each line in characters, so getting the "offset" is a quick "RandomAccess" lookup by line number, and reading the text is a "skip" with this offset. Very large files can be viewed with this technique.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.