We have a requirement where we already have pre printed stationery and want user to put data in a HTML form and be able to print data on that form. Alignment/text size etc are very important since the pre-printed stationery already has boxes for each character. What could be a good way to achieve this in java? I have thinking of using jasper reports. Any other options? May be overlay image with text or something?
Also we might need to capability to print on plain paper in which case the boxes needs to be printed by our application and the form should match after the printed with the already printed blank stationery containing data.
Do we have some open source framework to do such stuff?
Jaspersoft reports -- http://sourceforge.net/projects/jasperreports/
You will then create XML templates, then you will be able to produce a report in PDF, HTML, CSV, XLS, TXT, RTF, and more. It has all the necessary options to customize the report. Used it before and recommend it.
You will create the templates with iReport then write the code for the engine to pass the data in different possible ways.
check http://www.jaspersoft.com/jasperreports
Edit:
You can have background images and overlay the boxes over it and set a limit on the max character size ... and many more
It is very powerful and gives you plenty of options
Here is one of iReport's tutorial for a background image http://ireport-tutorial.blogspot.com/2008/12/background-image-in-ireport.html
The big problem when printing form content that has been filled in electronically, is aligning it correctly on the pre-printed form. You may get content to align for one printer, but when you use another it is completely misaligned.
Fly Software have a form design product called InForm Designer that gets around the problem nicely by allowing users to specify and save vertical and horizontal offsets for printers. This ensures filled in form content is always aligned. I've tried it and it works perfectly. Take a look for yourself here...
http://www.flysoftware.com/products/inform_designer/overview.asp
It might be worth implementing a printer offset similar to InForm's in your own application (if possible).
Some things to think about.
First in terms of the web page, do you want use the stationery as the form layout?
Does it have to be exact?
Combed boxes (one for each character)
Do you want to show it like that on the web page, or deal with the combing later.
How are you going to deal with say a combed 6 digit number. Is this right aligned. What if they enter 7 digits. Same for text. what if it won't fit.
Font choices, we had a lot of fun with W...
How aligned do you want the character within the box, what font limitations does that imply, some of the auto magic software we looked at did crap like change the size of each character.
Combed editing is a nightmare, we display combed, but raise an edit surface the size of the full box on selection.
Another thing that might drive you barking mad, you find find small differences in the size and layout of the boxes, so they look okay from a distance but a column of boxes sort of shifts about by a pixel. Some of testing guys had to lend us their electron microscopes, so we could see how many ink molecules we were out by. :(
Expect to spend a lot of time in the UI side of things, and remember printed stationery changes, so giving yourself some sort of meta description of the form to start with will save you loads of trouble later on.
Related
Let me start off with saying that I've seen some questions in this regard but nothing actually specifically answering my issue.
Let's say I have a template version of a PDF received by someone else. It's a classic PDF that looks like a form but only one part is editable, the other parts are not because they are being filled in with data from an imported excel sheet. (Sadly I can't show an example)
I've been able to do most in iText 7 but I can't figure out for the life of me if I can change a flat text field into an editable form field, like on image 1.
Image 2 shows checkboxes that I want to make editable as well but I can only change their value within java and even then it only shows a cross, not a checkmark (even if I change the checktype).
It's a bit hard to explain but tl:dr is that I just want to know whether I can edit an existing flat PDF in java or if I have to do so in adobe itself.
Thanks
I'm trying to generate an xsl to be printed in a pre-printed sheet which works fine.
Now i want to give the user a better previsualization (in the pdf screen version) adding a background image which emulates the "pre-printed" stuf on the sheet to give the user a "context" of what is he printing.
The question is: Is there any way I can set a background image in xsl (using apache fop) visible only in pdf but not in the printed version of it?
Thank you all for reading or givin any advice.
Although as the comments state, you can't have content in the PDF that does not come out in a physical printed copy, here is one possible work around for you. Depending on how your users are ultimately going to be using FOP for PDF rendering and how your a driving the work flow, it's possible to pass a parameter into an xslt file before the transofrmation phase is run, so potentially, you could do a dual rendering of the same PDF, one that is presented to the user where the background image is enabled, and one that gets printed, you could just set a variable similar to how they do in this Example, and call it something like $isPreview, and just use a simple if or choose statement to check for 'Y' or 'N'.
Since you are sending to a printer, you may even want to take advantage of FOP's ability to generate to Postscript rather than PDF, I've used this feature quite extensively for print documents using FOP while also producing a PDF copy for electronic delivery via email or hosted services, and I've yet to find any discrepancy between the PDF rendering and what is printed after sending a rendered postscript file, so it should work well for you as well.
As I said, this is not truly a solution to your problem as you've presented it, but as a work around, it could get you the desired results if your clever about how you implement it.
I don;t think the statement that it is not possible is true, I am just not sure how to create such a PDF with FOP. Certainly you can add an image field. One would use a button field and place the image in the button. Then you would set the properties of that button to not print (printable false).
PDF support images in fields: https://answers.acrobatusers.com/adding-image-field-form-q41825.aspx
RenderX supports PDF Form fields but I do not see where they support an image inside the button, only text: http://www.renderx.com/reference.html#PDF%20Forms. But they do support setting a field to "printable".
I want to know if there is any solution for the following scenario:
I have an application which uploads the files, after scanning and transcoding them, onto a server. Suppose, an image file is being uploaded which has been tampered with some additional contents over it. Now, as the uploaded file is illegitimate, I want to remove the additional tampered contents and upload just the original part of this image file. Is it possible to do so in Java?
Thanks.
It's not possible to detect in the general case, but there are some heuristic methods available to determine whether an image has been edited. Try using the tools at http://imageedited.com/ to get an idea of what's possible.
Removing the edit is a much more difficult problem, which is probably impossible with current methods.
I'm just speculating here, and I don't know how well it would work in practice, but you could do it if you limit to specific sources of tampering. E.g., suppose you want to remove the logo added to an image by memegenerator.net.
You know in advance what the text looks like and where it is. Create a transparent png template that matches the text. Then sum the differences between the image and template pixel colors, multiplying each by the alpha of the template pixel. Since for this particular logo, it's basically white (although it seems to have a thin black shadow) you would get false positives for a picture with a white part there, so you'd also need to verify that the surrounding pixels are (within a tolerance) not white. It's not clever but it could work for certain sites.
For anything more flexible (e.g., logos on images which have subsequently been resized) you're in to the territory of OCR and TinEye-like image matching, which are more advanced than I could advise you on.
To correctly detect all kinds of "tampering" and filter "illegitimate" from "legitimate" in general, you'd need an artificial intelligence that could understand the meaning and context of what it's seeing. The short answer is: you can't. That's what humans are for.
If this is for a website, probably the best thing you can do is a report button that lets users of your site report images that don't fit with your site's rules.
I have a scenario where I need a Java app to be able to extract content from a PDF file in one of 2 modes: TEXT_ONLY or ALL. In text mode, only visible text ("visible" as if a human being was reading the PDF) is read out into strings. In all mode, all content (text, images, etc.) is read out of the file.
For instance, if a PDF file was to have 1 page in it, and that page had 3 paragraphs of contiguous text, and was word-wrapping 2 images, then TEXT_ONLY would extract all 3 paragraphs, and ALL would extract all 3 paragraphs and both images:
while(page.hasMoreText())
textList.add(page.nextTextChunk());
if(allMode)
while(page.hasMoreImages())
imageList.add(page.nextImage());
I know Apache Tika uses PDFBox under the hood, but am worried that this kind of functionality is shaded/prohibited by Tika (in which case, I probably need to do this directly from PDFBox).
So I ask: is this possible, and if so, which library is more appropriate for me to use? Am I going about this entirely the wrong way? Any pitfalls/caveats I am not considering here?
To expound some aspects of why #markStephens points you towards some resources giving some background on PDF.
In text mode, only visible text ("visible" as if a human being was reading the PDF) is read out into strings.
Your definition "visible" as if a human being was reading the PDF is not yet very well-defined:
Is text 1 pt in size visible? When zooming in, a human can read it; in standard magnification not, though. Which size would be the limit?
Is text in RGB (128, 129, 128) in a background of (128, 128, 128) visible? How different have the colors to be?
Is text displayed in some white noise pattern on a background of some other white noise pattern visible? How different have patterns to be?
Is text only partially on-screen visible? If yes, is one visible pixel enough? And what about some character 'I' in a giant size where the visible page area fits into the dot on the letter?
What about text covered by some annotation which can easily be moved, probably even by some automatically executed JavaScript code in the file?
What about text in some optional content group only visible when printing?
*...
I would expect most available PDF text parsing libraries to ignore all these circumstances and extract the text, at most respecting a crop box. In case of images with added, invisible OCR'ed text the extraction of that text in general is desired.
For instance, if a PDF file was to have 1 page in it, and that page had 3 paragraphs of contiguous text, and was word-wrapping 2 images, then TEXT_ONLY would extract all 3 paragraphs, and ALL would extract all 3 paragraphs and both images:
PDF (in general) does not know about paragraphs, just some groups of glyphs positioned somewhere on the page. Recognizing paragraphs is a task which cannot be guaranteed to work properly as there are heuristics at work. If, furthermore, you have multicolumn text with an irregular separation, maybe even some image in between (making it hard to decide whether there are two columns divided by the image or whether there is one column with an integrated image), you can count on recognition of the text flow let alone text elements like paragraphs, sections, etc. to fail miserably.
If your PDFs are either properly tagged or all generated by a tool chain for which patterns in the created PDF content streams betray text structures, you may be more lucky. In case of the latter, though, your solution would have to be custom-made for that tool chain.
but am worried that this kind of functionality is shaded/prohibited by Tika (in which case, I probably need to do this directly from pdfBox).
There you point towards another point of interest: PDFs can be marked that text extraction is forbidden while they otherwise can be displayed by anyone. While technically PDFs marked like that can be handled just like documents without that mark with just one decoding step (essentially they are encrypted with a publicly known password), doing so is clearly acting against the declared intention of the author and violating his copyright.
So I ask: is this possible, and if so, which library is more appropriate for me to use? Am I going about this entirely the wrong way? Any pitfalls/caveats I am not considering here?
As long as you expect 100% accuracy for generic input, you should reconsider your architecture.
If the PDFs are all you have and a solution as effective is possible is OK, on the other hand, there are multiple possible libraries for you, iText, and PDFBox to name but two while there are more. Which is best for you depends on more factors, e.g. on whether you need some generic solution or all PDFs are created by a tool chain as above.
In any case you'll have to do some programming yourself, though, to fine-tune them for your use case.
I am developing an application which looks like an eBook reader, but its not exactly an eBook reader.
I have a huge text which is divided into various chapters. Now i want to present that text as a book, user should get a feel of reading an eBook which will have various features like GoTo, Search, Table of Contents and most importantly page curl transition between pages.
Now the problem is how to divide the whole content into number of pages. How can I know that, the number of characters that are going to fit into the screen( depending on the screen size and font size). I am totally confused on where to start and how to proceed.
Ultimately I am planning to develop an eBook Reader which will read only one book that is the text which I give.
Please let me know to achieve this, where to put page breaks and where to put the text( in database or resource file).
I'm trying to use Android TextSwitcher UI Component.
It automatically breaks the text according to the screen size. You can add some click listeners to traverse between pages..