Toiling with HTMLEditorKit - java

Im a novice Java programmer trying to use the HTMLEditorKit library to traverse a HTML document and alter it to my linking (mostly for the fun of it, what I'm doing could be done in hand without a problem)
But my problem is: After i have modifed my HTML file i am left with a HTMLDocument that i have no clue how to save back to a HTML file.
HTMLEditorKit kit = new HTMLEditorKit();
File file = new File("local file")
HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
InputStreamReader(url.openConnection().getInputStream());
FileReader HTMLReader = new FileReader(file);
kit.read(HTMLReader, doc, 0);
after that i do my thing with the "doc" element.
Now that im done with that i just want to save it back, preferablly overwriting the file which i got HTML from in the first place.
Anyone able to tell me how to save the modified HTMLdocument into a html file afterwards?

You can use the write method of HTMLEditorKit class. Sample code here:
FileWriter writer = new FileWriter("local file");
try {
kit.write(writer, doc, 0, doc.getLength());
} finally {
writer.close();
}

Related

How to copy/move AcroForm fields from one document to new blank one using IText5 or IText7?

I need to copy whole AcroForm including field positions and values from template PDF to a new blank PDF file. How can I do that?
In short words - I need to get rid of "background" from the template and leave only filed forms.
The whole point of this is to create a PDF with content that would be printed on pre-printed templates.
I am using IText 5 but I can switch to 7 if usefull examples would be provided
After a lot of trial and error I have found the solution to "How to copy AcfroForm fields into another PDF". It is a iText v7 version. I hope it will help somebody someday.
private byte[] copyFormElements(byte[] sourceTemplate) throws IOException {
PdfReader completeReader = new PdfReader(new ByteArrayInputStream(sourceTemplate));
PdfDocument completeDoc = new PdfDocument(completeReader);
ByteArrayOutputStream out = new ByteArrayOutputStream();
PdfWriter offsetWriter = new PdfWriter(out);
PdfDocument offsetDoc = new PdfDocument(offsetWriter);
offsetDoc.initializeOutlines();
PdfPage blank = offsetDoc.addNewPage();
PdfAcroForm originalForm = PdfAcroForm.getAcroForm(completeDoc, false);
// originalForm.getPdfObject().copyTo(offsetDoc,false);
PdfAcroForm offsetForm = PdfAcroForm.getAcroForm(offsetDoc, true);
for (String name : originalForm.getFormFields().keySet()) {
PdfFormField field = originalForm.getField(name);
PdfDictionary copied = field.getPdfObject().copyTo(offsetDoc, false);
PdfFormField copiedField = PdfFormField.makeFormField(copied, offsetDoc);
offsetForm.addField(copiedField, blank);
}
offsetDoc.close();
completeDoc.close();
return out.toByteArray();
}
Did you check the PdfCopyForms object:
Allows you to add one (or more) existing PDF document(s) to create a new PDF and add the form of another PDF document to this new PDF.
I didn't find an example, but you could try something like this:
PdfReader reader1 = new PdfReader(src1); // a document with a form
PdfReader reader2 = new PdfReader(src2); // a document without a form
PdfCopyForms copy = new PdfCopyForms(new FileOutputStream(dest));
copy.AddDocument(reader1); // add the document without the form
copy.CopyDocumentFields(reader2); // add the fields of the document with the form
copy.close();
reader1.close();
reader2.close();
I see that the class is deprecated. I'm not sure of that's because iText 7 makes it much easier to do this, or if it's because there were technical problems with the class.

JAVA XMLOutPuter changing values of other nodes in XML document

I am using the below method to produce an XML document using getFile() to get the same file as my SAXBuilder.
SAXBuilder reader = new SAXBuilder();
Document document = reader.build(new File(file)); //file example: Desktop/test.camproj (which is an XML document)
When I write the file I use the below code. I only change one value in the XML document (which works), but when I open up this video (this xml file is a video file) all the values of all nodes of type Callout have been changed to their defaults. Is this the correct way of creating the file or am I doing something wrong? If I don't open the video and open the file in Notepad nothing except the node I change was altered.
XMLOutputter xmlOutput = new XMLOutputter();
xmlOutput.setFormat(Format.getPrettyFormat());
FileOutputStream output = new FileOutputStream(getFile());
xmlOutput.output(document, output);
document.detachRootElement();
output.close();
My change to the xml node (I am using /n/r to represent line breaks in XML):
newText = "{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fswiss\\fprq2\\fcharset0 Open Sans"
+ ";}}\n\r{\\colortbl ;\\red0\\green0\\blue0;}\n\r\\viewkind4\\uc1\\pard\\qc\\cf1\\f0\\fs36 " + newText + "\\par\n\r}\n\r;";
document.getRootElement().getChild("CSMLData").getChild("GoProject").getChild("Project")
.getChild("Timeline").getChild("GenericMixer").getChild("Tracks").getChildren().get(index2).getChild("Medias").getChildren().get(index)
.getChild("Attributes").getChild("Attribute").getChild("VectorNode").getChild("StringParameters")
.getChildren().get(3).getChild("Keyframes").getChild("Keyframe").setAttribute("value", newText);

Convert docx file into PDF with Java

I'am looking for some "stable" method to convert DOCX file from MS WORD into PDF. Since now I have used OpenOffice installed as listener but it often hangs. The problem is that we have situations when many users want to convert SXW,DOCX files into PDF at the same time. Is there some other possibility? I tryed with examples from this site: https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/ but the output result is not good (converted documents have errors and layout is quite modified).
here is "source" docx document:
here is document converted with docx4j with some exception text inside document. Also the text in upper right corner is missing.
this one is PDF created with OpenOffice as converter from docx to pdf. Some text is missing "upper right corner"
Is there some other option to convert docx into pdf with Java?
There are lot of methods to do conversion
One of the used method is using POI and DOCX4j
InputStream is = new FileInputStream(new File("your Docx PAth"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(is);
List sections = wordMLPackage.getDocumentModel().getSections();
for (int i = 0; i < sections.size(); i++) {
wordMLPackage.getDocumentModel().getSections().get(i)
.getPageDimensions();
}
Mapper fontMapper = new IdentityPlusMapper();
PhysicalFont font = PhysicalFonts.getPhysicalFonts().get(
"Comic Sans MS");//set your desired font
fontMapper.getFontMappings().put("Algerian", font);
wordMLPackage.setFontMapper(fontMapper);
PdfSettings pdfSettings = new PdfSettings();
org.docx4j.convert.out.pdf.PdfConversion conversion = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
wordMLPackage);
//To turn off logger
List<Logger> loggers = Collections.<Logger> list(LogManager
.getCurrentLoggers());
loggers.add(LogManager.getRootLogger());
for (Logger logger : loggers) {
logger.setLevel(Level.OFF);
}
OutputStream out = new FileOutputStream(new File("Your OutPut PDF path"));
conversion.output(out, pdfSettings);
System.out.println("DONE!!");
This works perfect and even tried on multiple DOCX files.

Merge documents to create TOC in iText (Java)

When creating documents with iText that need a table of contents, I have usually used a process where I create the main document in memory, create the TOC as a separate document in memory (using dummy links), merge them as a third document, and then use a PdfStamper to reconcile the links into the document and write it to a file.
This works with all versions of iText except the most recent (5.5.6). I will include a simple program that does this process (the real programs are much more complex). When running this with iText 5.5.5 or earlier, it creates the desired document (2 pages with the first page containing text that provides a link to open the second page). With 5.5.6 the call to makeRemoteNamedDestinationsLocal causes an exception com.itextpdf.text.pdf.PdfDictionary cannot be cast to com.itextpdf.text.pdf.PdfArray.
As this had always worked until the latest version, I have some suspicion that this may be a bug in the newest version. Is this a bug, or am I doing something wrong? How should I do this task if it is not a bug? Additionally, how are bug reports usually submitted for iText? From the website, it looks like they expect a question to be submitted here as a report.
import com.itextpdf.text.pdf.*;
import com.itextpdf.text.pdf.draw.*;
import java.io.*;
// WORKS CORRECTLY USING itext version 5.5.5
// FAILS WITH 5.5.6
// CAUSES AN EXCEPTION
// "com.itextpdf.text.pdf.PdfDictionary cannot be cast to com.itextpdf.text.pdf.PdfArray"
// with makeRemoteNamedDestinationsLocal()
public class testPdf {
public static void main (String[] args) throws Exception {
// Create simple document
ByteArrayOutputStream main = new ByteArrayOutputStream();
Document doc = new Document(new Rectangle(612f,792f),54f,54f,36f,36f);
PdfWriter pdfwrite = PdfWriter.getInstance(doc,main);
doc.open();
doc.add(new Paragraph("Testing Page"));
doc.close();
// Create TOC document
ByteArrayOutputStream two = new ByteArrayOutputStream();
Document doc2 = new Document(new Rectangle(612f,792f),54f,54f,36f,36f);
PdfWriter pdfwrite2 = PdfWriter.getInstance(doc2,two);
doc2.open();
Chunk chn = new Chunk("<<-- Link To Testing Page -->>");
chn.setRemoteGoto("DUMMY.PDF","page-num-1");
doc2.add(new Paragraph(chn));
doc2.close();
// Merge documents
ByteArrayOutputStream three = new ByteArrayOutputStream();
PdfReader reader1 = new PdfReader(main.toByteArray());
PdfReader reader2 = new PdfReader(two.toByteArray());
Document doc3 = new Document();
PdfCopy DocCopy = new PdfCopy(doc3,three);
doc3.open();
DocCopy.addPage(DocCopy.getImportedPage(reader2,1));
DocCopy.addPage(DocCopy.getImportedPage(reader1,1));
DocCopy.addNamedDestination("page-num-1",2,new PdfDestination(PdfDestination.FIT));
doc3.close();
// Fix references and write to file
PdfReader finalReader = new PdfReader(three.toByteArray());
// Fails on this line
finalReader.makeRemoteNamedDestinationsLocal();
PdfStamper stamper = new PdfStamper(finalReader,new FileOutputStream("Testing.pdf"));
stamper.close();
}
}
You have detected a bug that was introduced in iText 5.5.6. This has already been fixed in our repository:
Thank you for reporting this bug. You can find the fix on github: https://github.com/itext/itextpdf/commit/eac1a4318e6c31b054e0726ad44d0da5b8a720c2

Converting a pdf to word document using java

I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.
Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?
import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
public class JpegToPDF {
public static void main(String[] args) {
try {
Document convertJpgToPdf = new Document();
PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
"c:\\java\\ConvertImagetoPDF.pdf"));
convertJpgToPdf.open();
Image convertJpg = Image.getInstance("c:\\java\\test.jpg");
convertJpgToPdf.add(convertJpg);
convertJpgToPdf.close();
System.out.println("Successfully Converted JPG to PDF in iText");
} catch (Exception i1) {
i1.printStackTrace();
}
}
}
In fact, you need two libraries. Both libraries are open source. The first one is iText, it is used to extract the text from a PDF file. The second one is POI, it is ued to create the word document.
The code is quite simple:
//Create the word document
XWPFDocument doc = new XWPFDocument();
// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
// Extract the text
String text=strategy.getResultantText();
// Create a new paragraph in the word document, adding the extracted text
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
// Adding a page break
run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();
Beware: With the used extraction strategy, you will lose all formatting. But you can fix this, by inserting your own, more complex extraction strategy.
You can use 7-pdf library
have a look at this it may help :
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
PS: itext has some issues when given file is non RGB image, try this out!!
Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port; it's possible to use that to convert documents. If this looks like an acceptable solution, JODConverter can help you.

Categories

Resources