Swing Large Files Performance - java

We need to load and display large files (rich text) using swing, about 50mb. The problem is that the performance to render the files is incredibly poor. We tried both JTextPane and JEditorPane with no luck.
Does someone have experience with this and could give me some advise ?
thanks,

I don't have any experience in this but if you really need to load big files I suggest you do some kind of lazy loading with JTextPane/JEditorPane.
Define a limit that JTextPane/JEditorPane can handle well (like 500KB or 1MB). You'll only need to load a chunk of the file into the control with this size.
Start by loading the 1st partition of the file.
Then you need to interact with the scroll container and see if it has reached the end/beginning of the current chunk of the file. If so, show a nice waiting cursor and load the previous/next chunk to memory and into the text control.
The loading chunk is calculated from your current cursor position in the file (offset).
loading chunk = offset - limit/2 to offset + limit/2
The text on the JTextPane/JEditorPane must not change when loading chunks or else the user feels like is in another position of the file.
This is not a trivial solution but if you don't find any other 3rd party control to do this I would go this way.

You could use Memory Mapped File I/O to create a 'window' into the file and let the operating system handle the reading of the file.

Writing an efficient WYSIWYG text editor that can handle large documents is a pretty hard problem--Even Word has problems when you get into large books.
Swing is general purpose, but you have to build up a toolset around it involving managing documents separately and paging them.
You might look at Open Office, you can embed an OO document editor screen right into your app. I believe it's called OOBean...

JTextPane/JEditorPane do not handle well even 1mb of text (especially text with long lines).
You can try JEdit (StandaloneTextArea) - it is much faster than Swing text components, but I doubt it will handle this much text. I tried with 45m file, and while it was loaded (~25 seconds) and I could scroll down, I started getting "outofmemory" with 1700m heap.
In order to build a really scalable solution there are two obvious options really:
Use pagination. You can do just fine with standard Swing by displaying text in pages.
Build a custom text renderer. It can be as simple as a scrollable pane where only the visible part is drawn using BufferedReader to skip to the desired line in the file and read a limited number of lines to display. I did it before and it is a workable solution. If you need to have 'text selection' capabilities, this is a little more work, of course.
For really large files you could build an index file that contains offsets of each line in characters, so getting the "offset" is a quick "RandomAccess" lookup by line number, and reading the text is a "skip" with this offset. Very large files can be viewed with this technique.

Related

Jasper Reports cutting large String between pages

I don't know if "cutting" is the right term...
I've got to finish doing a large and complex report based on an Applet legacy system, a fellow and I decided trying reuse all the logic in the applet to avoid the complexity of doing a lot of sub-reports. What we did was copy all the logic in the applet that include a lot of condictionals/SQL and make a huge and properly formated String, so that in our Jasper file it would just have a method called "myVo.getBody()" besides the header and footer stuff.
Unfortunately we found out a problem that some part of text get lost between pages. I think that as the text get bigger and reach Jasper page limit for some reason it keeps being writed in a "no visible area" and when the next page content starts some part was lost.
For example, there is a list of 19 items and what happens is:
End of 2nd page
1 - item
2 - item
beggining of 3rd page
18th - item
19th - item
Items from 3 to 17 are not being showed.
Is there any Jasper configuration for this situation?
We tried:
Position type: Fix Relative to the Top and Float
Stretch Type: Relative to the Tallers Object and Relative to Band Height
Stretch With Overflot: true or false
I don't think showing Java code would be useful as it just use a StringBuffer to build the String, put it on body property in a PreparedDocumentVO so that Jasper model can consumes it. It seems to be some Jasper setting, or the idea of creating a huge String is not so good as we thought.
I would consider breaking the result up.
Jasper formats information based on a relative page size. This means that at some point in time, when dealing with information that is not likely to fit on a page, Jasper will probably make an assumption that doesn't hold (and your data will likely not be formatted into the page).
If you have an exceptionally long string, consider splitting it up. Besides, people scroll web pages down, not the side, so a heavy side-scrolling document is likely to cause user issues unless every record scrolls to the side just as heavily.

restart SAX parser from the middle of the document

I'm working on a project that needs to parse a very big XML file (about 10GB). Because process time is really long (about days), It's possible that my code exit in the middle of the process; so I want to save my code's status once in a while and then be able to restart it from last save point.
Is there a way to start (restart) a SAX parser not from the beginning of a XML file?
P.S: I'm programming using Python, but solutions for Java and C++ are also acceptable.
Not really sure if this answers your question, but I would take a different approach. 10GB is not THAT much data, so you could implement a two-phase parsing.
Phase 1 would be to split the file in smaller chunks based on some tag, so you end up with more smaller files. For example if your first file is A.xml, you split it to A_0.xml, A_1.xml etc.
Phase 2 would do the real heavy lifting on each chuck, so you invoke it on A_0.xml, then after that on A_1.xml etc. You could then restart on a chunk after your code has exitted.

Android pdf writer APW high resolution images cause out of memory expection

I am using android pdf writer
(apw) in my app successfully for the most part. However, when I try to include a high resolution in a pdf document, I get an out of memory exception.
Immediately before creating the pdf file, the library must have the content itself converted into a string (representing the raw pdf content), which is then converted to a byte array. The byte array is written to the file in a file output stream (see example via website).
The out of memory expection occurs when the string is generated because representing all the pixels of a bitmap image in string format is very memory intensive. I could downsample the image using the android API, however, it is essential that the images are put into the pdf at high resolution (~2000 x 1000).
There are many scanner type apps which seem to be able to able generate pdf high res images, so there must be a way around it, surely. Granted, they may be using other libraries, but surely there is someone who has figured out a way around it with this library given that it is free and therefore popular(?)
I emailed the developer, but there was no response.
Potential solutions (I can think of) include:
Modifying the library to load a string representing e.g. the first 10% of the PDF, and writing to file chunk by chunk. (edit)
Modifying the library to output a stringoutput stream, or other output stream to a temp file (or final file) as the actual pdf content is being written in the pdfwriter object.
However as a relative java noob (and even more of a pdf specification noob), I am unable to understand the library well enough to do this myself.
Has anyone come across this problem and found a way around it? Anyone willing to hazard a suggestion, or take a look at the library itself even to see if there is a fix of some sort.
Thanks for your help.
nme32
Edit:
Logcat says heap size is in the range on 40 to 60mb before the crash. I understand (do correct me if not) that Android limits the available memory to apps depending on what else is running, though it is in the 50mb ballpark, depending on device.
When loading the image, I think APW essentially converts it to bitmap, that is represents the image pixel by pixel then puts it into string format, meaning it doesn't matter which image format you use, it may as well be bitmap.
First of all the resolution you are mentioning is very high. And i have already mentioned the issues related to Images in Android in this Answer
Secondly in case first solution doesn't work for you i would suggest Disk based LruCache.And store the chunks into that disk based cache and then retrieve and use it. Here is an Example of that.
Hope this would help. If it doesn't comment on this answer and i will add more solutions.

Reading large images from HDFS in mapreduce

There is a very large image (~200MB) in HDFS (block size 64MB). I want to know the following:
How to read the image in a mapReduce job?
Many topics suggest WholeInputFormat. Is there any other alternative and how to do it?
When WholeInputFormat is used, will there be any parallel processing of the blocks? I guess no.
If your block size is 64 MB, most probably HDFS would have split your image file into chunks and replicated it across the cluster, depending on what your cluster configuration is.
Assuming that you want to process your image file as 1 record rather than multiple blocks/line by line, here are a few options I can think of to process image file as a whole.
You can implement a custom input format and a record reader. The isSplitable() method in the input format should return false. The RecordReader.next( LongWritable pos, RecType val ) method should read the entire file and set val to the file contents. This will ensure
that the entire file goes to one map task as a single record.
You can sub-class the input format and override the isSplitable() method so that it returns false. This example shows how create a sub-class
SequenceFileInputFormat to implement a NonSplittableSequenceFileInputFormat.
I guess it depends on what type of processing you want to perform. If you are trying to perform something that can be done first splitting the big input into smaller image files and then independently processing the blocks and finally stitching the outputs parts back into large final output - then it may be possible. I'm no image expert but suppose if you want to make a color image into grayscale then you may be cut the large image into small images. Then convert them parallelly using MR. Once the mappers are done then stitch them back to one large grayscale image.
If you understand the format of the image then you may write your own recordreader to help the framework understand the record boundaries preventing corruption when they are inputted to the mappers.
Although you can use WholeFileInputFormat or SequenceFileInputFormat or something custom to read the image file, the actual issue(in my view) is to draw something out of the read file. OK..You have read the file, now what??How are you going to process your image to detect any object inside your mapper. I'm not saying it's impossible, but it would require a lot work to be done.
IMHO, you are better off using something like HIPI. HIPI provides an API for performing image processing tasks on top of MapReduce framework.
Edit :
If you really want to do it your way, then you need to write a custom InputFormat. Since images are not like text files, you can't use delimiters like \n for split creation. One possible workaround could be to create splits based on some given number of bytes. For example, if your image file is of 200MB, you could write an InputFormat which will create splits of 100MB(or whatever you give as a parameter in your Job configuration). I had faced such a scenario long ago while dealing with some binary files and this project had helped me a lot.
HTH

Reduce PDF file size in itext (java)

I'm creating a Web-based label printing system. For every label, there should be a unique s/n. So when a user decided to create 1000 labels (with the same data), all of it should have unique s/n, therefore the pdf will have 1000 pages, which increases the file size.
My problem is when the user decided to create more copies, the file size will get bigger.
Is there any way that I can reduce the file size of the pdf using Itext? Or is there any way that I can generated the pdf and output it in the browser without saving it neither to server/client's HDD?
Thanks for the help!
On approach is to compress the file. It should be highly compressible.
(I imagine that you should be able to generate the PDF on the server side without writing it to disc, though you could use a lot of memory / Java heap in the process. I don't think it is possible to deliver a PDF to the browser without the file going to the client PC's hard drive in some form.)
If everything except the s/n is the same for the thousands of labels, you only have to add the equal things one time as a template and put the s/n text on top of it.
Take a look at PDFTemplate in itext. If I recall correctly that creates and XObject for the recurring drawing/label/image.... and it is exactly the same object every time you use it.
Even with thousands of labels, the only thing that grows your document size is the s/n (and every page) but the graphics or text of the 'label' is only added once. That should reduce your file size.

Categories

Resources