I'm working with aspose in particular aspose.words, and I 'm using it in order to convert a document (.doc) to a PDF, in Java/JavaFX. Now, I want to convert a simple url: www.google.com for example, into a png, or in general an image. Is this possible?
Yes, you can meet this requirement using Aspose.Words for Java API. Please use the following code to convert webpage to multi-page TIFF image for example:
LoadOptions opts = new LoadOptions();
opts.setLoadFormat(LoadFormat.HTML);
Document doc = new Document("http://www.google.com", opts);
doc.save(getMyDir() + "out.tiff");
Related
I used iText 5 to create a nice looking report which includes some tables and graphs. I wonder if iText lets you convert PDF to HTML and if so .. how can one do it?
I believe previous versions of iText allowed it, but in iText 5 i was not able to find a way to do this.
No. iText has never converted PDF to HTML, only the reverse.
Have you had a look at http://www.jpedal.org/pdf_to_html_conversion.php - there is currently a free beta.
Possible to do with Apache Tika (it uses Apache PDFBox under the hood):
public String pdfToHtml(InputStream content) {
PDDocument pddDocument = PDDocument.load(content);
PDFText2HTML stripper = new PDFText2HTML("UTF-8");
return stripper.getText(pddDocument);
}
I am developing a module where i am supposed to print documents from the server. Following are the requirements :
the module should be able to print a pdf from a url, with & without saving
the module should be able to accept page numbers as parameters and only print/save those page numbers.
the module should be able to accept the printer name as a parameter and use only that printer
Is there any library available for this? How should i go about implementing this?
The answer was Apache PDFBox . I was able to load the PDF into a PDDocument object like this :
PDDocument pdf = PDDocument.load(new URL(download_pdf_from).openStream());
Splitting the document was as easy as :
Splitter splitter = new Splitter();
List<PDDocument> splittedDocuments = splitter.split(pdf);
Now, to get a reference to any particular page:
splittedDocuments.get(pageNo);
Saving the entire document or even a given page number :
pdf.save(path); //saving the entire document to device
splittedDocuments.get(pageNo).save(path); //saving a particular page number to device
For the printing part, this answer helped me.
I need to convert a docx to a pdf. The following code use the library xdocreport and works pretty well.
The problem is for some specific docx which contain drawings. They are not visible in the final pdf. I've tested the conversion with the live demo avaible from the github and I've the same problem.
So I'm wondering, is this possible, or do I need to use an other library ? Which one ? (dox4j doesn't seems to works neither).
final XWPFDocument document = new XWPFDocument(inputStream);
final OutputStream outPdf = new FileOutputStream("myFile.pdf");
PdfConverter.getInstance().convert(document, outPdf, optionsPdf);
outPdf.close();
XDocReport doesn't support drawing. It could support it since docx->pdf is based on iText which supports draw, but it's a big task (any contribution are welcome!)
You can see here limitation of XDocReport docx->pdf converter.
So I summarize my problem. I would like to convert an xls file to PDF, while using java. .
I find two examples
The first is with Openoffice
import officetools.OfficeFile; // from officetools.jar
FileInputStream fis = new FileInputStream(new File("test.doc"));
FileOutputStream fos = new FileOutputStream(new File("test.pdf"));
OfficeFile f = new OfficeFile(fis,"localhost","8100", false);
f.convert(fos,"pdf");
But unfortunately I have to install it :(
I also find this example, two command line with vb (call pdf creator)
DoCmd.OpenReport "repClient", acViewPreview, "NumClient = 2"
DoCmd.OutputTo acOutputReport, "PDF", "d: \ test.pdf"
is there somthing like that on java !!!!
(Note I used for my first solution (jxl, appach poi) but formatting pdf generated is not like when I do save as PDF with Microsoft Excel)
think you in advance
I think you can stream the data from the excel document using
apache POI
library. You can pass this stream of data in
iText library API.
iText library API definitely has a function which writes stream data into PDF file. With iText, you can be sure of pdf formatting as it is widely used in organizations for PDF generation. Infact many reporting tool also use iText to generate PDF reports.
I am trying to generate a PDF document from a *.doc document.
Till now and thanks to stackoverflow I have success generating it but with some problems.
My sample code below generates the pdf without formatations and images, just the text.
The document includes blank spaces and images which are not included in the PDF.
Here is the code:
in = new FileInputStream(sourceFile.getAbsolutePath());
out = new FileOutputStream(outputFile);
WordExtractor wd = new WordExtractor(in);
String text = wd.getText();
Document pdf= new Document(PageSize.A4);
PdfWriter.getInstance(pdf, out);
pdf.open();
pdf.add(new Paragraph(text));
docx4j includes code for creating a PDF from a docx using iText. It can also use POI to convert a doc to a docx.
There was a time when we supported both methods equally (as well as PDF via XHTML), but we decided to focus on XSL-FO.
If its an option, you'd be much better off using docx4j to convert a docx to PDF via XSL-FO and FOP.
Use it like so:
wordMLPackage = WordprocessingMLPackage.load(new java.io.File(inputfilepath));
// Set up font mapper
Mapper fontMapper = new IdentityPlusMapper();
wordMLPackage.setFontMapper(fontMapper);
// Example of mapping missing font Algerian to installed font Comic Sans MS
PhysicalFont font
= PhysicalFonts.getPhysicalFonts().get("Comic Sans MS");
fontMapper.getFontMappings().put("Algerian", font);
org.docx4j.convert.out.pdf.PdfConversion c
= new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(wordMLPackage);
// = new org.docx4j.convert.out.pdf.viaIText.Conversion(wordMLPackage);
OutputStream os = new java.io.FileOutputStream(inputfilepath + ".pdf");
c.output(os);
Update July 2016
As of docx4j 3.3.0, Plutext's commercial PDF renderer is docx4j's default option for docx to PDF conversion. You can try an online demo at converter-eval.plutext.com
If you want to use the existing docx to XSL-FO to PDF (or other target supported by Apache FOP) approach, then just add the docx4j-export-FO jar to your classpath.
Either way, to convert docx to PDF, you can use the Docx4J facade's toPDF method.
The old docx to PDF via iText code can be found at https://github.com/plutext/docx4j-export-FO/.../docx4j-extras/PdfViaIText/
WordExtractor just grabs the plain text, nothing else. That's why all you're seeing is the plain text.
What you'll need to do is get each paragraph individually, then grab each run, fetch the formatting, and generate the equivalent in PDF.
One option may be to find some code that turns XHTML into a PDF. Then, use Apache Tika to turn your word document into XHTML (it uses POI under the hood, and handles all the formatting stuff for you), and from the XHTML on to PDF.
Otherwise, if you're going to do it yourself, take a look at the code in Apache Tika for parsing word files. It's a really great example of how to get at the images, the formatting, the styles etc.
I have succesfully used Apache FOP to convert a 'WordML' document to PDF. WordML is the Office 2003 way of saving a Word document as xml. XSLT stylesheets can be found on the web to transform this xml to xml-fo which in turn can be rendered by FOP into PDF (among other outputs).
It's not so different from the solution plutext offered, except that it doesn't read a .doc document, whereas docx4j apparently does. If your requirements are flexible enough to have WordML style documents as input, this might be worth looking into.
Good luck with your project!
Wim
Use OpenOffice/LbreOffice and JODConnector
This also mostly works for .doc to .docx. Problems with graphics that I have not yet worked out though.
private static void transformDocXToPDFUsingJOD(File in, File out)
{
OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
DocumentFormat pdf = converter.getFormatRegistry().getFormatByExtension("pdf");
converter.convert(in, out, pdf);
}
private static OfficeManager officeManager;
#BeforeClass
public static void setupStatic() throws IOException {
/*officeManager = new DefaultOfficeManagerConfiguration()
.setOfficeHome("C:/Program Files/LibreOffice 3.6")
.buildOfficeManager();
*/
officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();
officeManager.start();
}
#AfterClass
public static void shutdownStatic() throws IOException {
officeManager.stop();
}
You need to be running LibreOffice as a serverto make this work.
From the command line you can do this using;
"C:\Program Files\LibreOffice 3.6\program\soffice.exe" -accept="socket,host=0.0.0.0,port=8100;urp;LibreOffice.ServiceManager" -headless -nodefault -nofirststartwizard -nolockcheck -nologo -norestore
Another option I came across recently is using the OpenOffice (or LibreOffice) API (see here). I have not been able to get into this but it should be able to open documents in various formats and output them in a pdf format. If you look into this, let me know how it worked!