Send to printer a specific document layout from code - java

i need to print on paper some png files below their filename as text, in java.
The doc style should follow this example:
How can i achive this?
Thanks in advance
EDIT:
I have found this nice "IText" guide, and seems exactly what i was looking for.
It's just for creating pdf, so it won't sent nothing to the printer, but can resolve my problem in a nice way. I'll give it a try tomorrow
IText Guide

Related

opencv find block of text areas / detect document layout

I have color image document with text and images and tables.
Document can have two columns.
Document is composite from areas: area header and text (bigger font, can have different font color and something like sub-header additional data).
This is exemplary image but real one can be color:
What i need to do.
I need find on image document this areas of text with headers.
What i need to know.
Method how to divide document to divide document on particular parts.
I try with opencv in java(if someone have python and c++ version i can convert it for java version by myself). I found few similar problem on stack overflow, but none of them can help me. You must know that my opencv knowledge is not very well and it is only from on-line tutorials and stack overflow.
Is there any fine solution on my problem in opencv way or i need use something else, different library or application to achieve this?
One and only requirement is that it must be done from command line.
If i had this areas i can do what i need next, but this is step which stops me.
have you solved the problem?
I'm working on a similar problem.
My solution is to use HoughLines https://docs.opencv.org/3.4.0/d9/db0/tutorial_hough_lines.html
You can use text detection combined with dilation to detect bold text i.e. headers and then group the text boxes between two consecutive headers as the text under first header.

Workaround for known Bug in PDFBox

first of all: My goal is to just load a PDF, highlight words from that PDF (Page) and show that Page / PDF to the user as Image.
Till now i parse the PDF with a custom Text-Stripper to get all word-positions with their coordinates ( needed to generate a rectangle for highlighting later)
After that i started to generate PDAnnotationTextMarkup's so. Now i'm at this point where i can see my annotations well if i save the pdf to a file and view it with a PDFReader by choice. But if i use the convertToImage Method given by PDFBox, i only get a normal page rendered without annotations.
After a little time on google i found: PDFBOX-2019 which was mentioned in another stackoverflow question
Now im looking for a workaround because i think the ticket history is showing that no one will fix that issue in about a year.
Anybody a good idea to fix that and achieve my goal?
thanks in advance
ben

Reading from arc file (commoncrawl dataset) with ARCReader

Well this question may sound stupid, but I did research like hours to find solution but I couldn't so if anyone knows, that would be GREAT!!!
I successfully read arc file (from commoncrawl dataset). With arcHeader.getUrl(); I'm getting all URLs. However I don't understand, if 'outgoing' links from that particular URL is there, if its there how to get those?
[PS] By 'outgoing', I mean, in whole page, which URL it contains as say ad, content etc. Does that commoncrawl arc file contains, if yes how to get those?
Thanks in advance!
EDIT: I solved this, read HTML content and got all ! wasnt that difficult!

How to get line count with iText

I am using iText library for writing a PDF file.
I want to give page numbers and page header on every page of file
How can I do that?
If you don't break lines manually then it's really hard almost impossible to precisely get the line count. This number depends on font measurement and the text layout on a page. iText is a great tool for PDF generation not for parsing.
I suggest you buy the iText book "iText in Action" which is very helpful. If you don't generate your page breaks yourself, you could checkout the PdfPageEvent methods such as onStartPage() and the getPageNumber() method.

how can get highligted word from pdf file?

I develope new program but i need to allow user to highlighting word in pdf file then i want to process the file to get list of highlighted words with place
how can do that by java
thank in advance
PDF files are PostScript, which is very difficult to process. I doubt there's an easy way.
Take a look at http://java-source.net/open-source/pdf-libraries , but be aware you might have some difficulty.
Also, read http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pdf for the specs of the highlight format. Depending on what "place" information you need, that might be enough.
How are you displaying the PDF? If you are displaying the image, you just need the word co-ordinates. Something like PdfBox or JPedal or maybe IText can do this.

Categories

Resources