Converting a html file or string through docx4j getting an error while running the code
public static void convertHtmltoWord2(String html) {
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
// Convert the HTML, and add it into the empty docx we made
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
XHTMLImporter.setHyperlinkStyle("Hyperlink");
wordMLPackage.getMainDocumentPart().getContent().addAll(
XHTMLImporter.convert(html, baseURL) );
wordMLPackage.save(new java.io.File("C:\\Converted_Word.docx") );
}
Below Error:
java.util.NoSuchElementException
at org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart.<init>(MainDocumentPart.java:76)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.createPackage(WordprocessingMLPackage.java:432)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.createPackage(WordprocessingMLPackage.java:421)
Any idea why its not working?
Related
this is the code where I am converting MathMl to png
public static void main(String[] args) throws IOException, FontFormatException {
Converter converter = Converter.getInstance();
String math="<math xmlns=\"http://www.w3.org/1998/Math/MathML\"><mfenced open=\"[\" close=\"]\"><mtable><mtr><mtd><mn>3</mn></mtd><mtd><mn>2</mn></mtd></mtr><mtr><mtd><mn>6</mn></mtd><mtd><mn>7</mn></mtd></mtr><mtr><mtd><mn>6</mn></mtd><mtd><mn>4</mn></mtd></mtr></mtable></mfenced><mo>=</mo><msubsup><mo>∫</mo><mn>5</mn><mn>4</mn></msubsup></math>";
File inputFile = new File("D:\\mathml.xml");
File outputFile = new File("D:\\image.jpg");
//params to mention the size of image
MutableLayoutContext params = new LayoutContextImpl(
LayoutContextImpl.getDefaultLayoutContext());
params.setParameter(Parameter.MATHSIZE, 50f);
Document doc = StringToDocumentToString.convertStringToDocument(math);
// Parameter parameter= new Pa
converter.convert(doc, outputFile , "image/jpeg", params);
}
I want to use Tahoma.ttf when I make png from MathMl but I can not find any resource how to do that. Please can anyone help me?
I have a maven project with docx4j. I have managed to successfully convert html file to docx. However I'm interested into inserting a header to the docx file.
In github docx4j there is a sample (link) which I used the it worked as expected, i.e.
Relationship relationship = createHeaderPart(wordMLPackage);
public static Relationship createHeaderPart(
WordprocessingMLPackage wordprocessingMLPackage)
throws Exception {
HeaderPart headerPart = new HeaderPart();
Relationship rel = wordprocessingMLPackage.getMainDocumentPart()
.addTargetPart(headerPart);
// After addTargetPart, so image can be added properly
headerPart.setJaxbElement(getHdr(wordprocessingMLPackage, headerPart));
return rel;
}
public static Hdr getHdr(WordprocessingMLPackage wordprocessingMLPackage,
Part sourcePart) throws Exception {
Hdr hdr = objectFactory.createHdr();
// I modified it for simplicity
P headerParagraph = docx.getMainDocumentPart().createParagraphOfText("hi there");
hdr.getContent().add(headerParagraph);
return hdr;
}
This is working as expected
However I'm interested into using dynamic content from html so I used:
public static Hdr getHdr(WordprocessingMLPackage wordprocessingMLPackage,
Part sourcePart) throws Exception {
Hdr hdr = objectFactory.createHdr();
String html = "<html><body><p>hi there</p></body></html>";
XHTMLImporter XHTMLImporter = new XHTMLImporterImpl(wordprocessingMLPackage);
hdr.getContent().add(XHTMLImporter.convert(html, null));
return hdr;
}
This doesn't work at all. Any ideas?
I just noticed that XHTMLImporter is creating a list of objects, i.e.
public static Hdr getHdr(WordprocessingMLPackage wordprocessingMLPackage,
Part sourcePart) throws Exception {
Hdr hdr = objectFactory.createHdr();
String html = "<html><body><p>hi there</p></body></html>";
XHTMLImporter XHTMLImporter = new XHTMLImporterImpl(wordprocessingMLPackage);
List<Object> list = XHTMLImporter.convert(html, null);
hdr.getContent().add(list.get(0));
return hdr;
}
I am trying to extract the font style that is applied to a specific paragraph with Apache POI. The method getStyle() returns null on the my XWPFParagraph object.
Calling the method getCTR().getRPr().getRStyle() on the first XWPFRun object also returns null.
Calling the method getStyle().getDocDefaults().getRPrDefault() on my XWPFDocument object returns this:
<w:rPr>
<w:rFonts w:asciiTheme="minorHAnsi"/>
<w:sz w:val="22"/>
<w:szCs w:val="22"/>
<w:lang w:val="en-GB" w:eastAsia="en-US" w:bidi="ar-SA"/>
</w:rPr>
Where there are no w:ascii attribute in the w:rFonts tag. There is however a w:asciiTheme attribute declared in the tag. How can I extract the information under the given theme with Apache POI?
The font style for this example is defined as the theme minorHAnsi and the theme can be found in the theme1.xml file. But how can I for example extract the attribute under the a:latin tag using Apache POI?
Here is an sample from what it looks like in the theme1.xml file:
<a:minorFont>
<a:latin typeface="Calibri"/>
<a:ea typeface=""/>
<a:cs typeface=""/>
<a:font script="Jpan" typeface="MS 明朝"/>
<a:font script="Hang" typeface="맑은 고딕"/>
<a:font script="Hans" typeface="宋体"/>
...
<a:font script="Viet" typeface="Arial"/>
<a:font script="Uigh" typeface="Microsoft Uighur"/>
<a:font script="Geor" typeface="Sylfaen"/>
</a:minorFont>
If the question is how to get the /word/theme/theme1.xml out of the *.docx file system, then how to parse that and then get <a:minorFont><a:latin... out of it, then this could be solved like so:
First do using methods of OPCPackage to get the package part /word/theme/theme1.xml.
...
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordExample.docx"));
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/theme/theme1.xml");
PackagePart themePart = oPCPackage.getPart(partName);
...
Then, if we have that PackagePart, do parsing that into a org.openxmlformats.schemas.drawingml.x2006.main.ThemeDocument. Then do using methods of org.openxmlformats.schemas.drawingml.x2006.main.ThemeDocument to get the child elements of that.
...
ThemeDocument themeDocument = ThemeDocument.Factory.parse(themePart.getInputStream());
CTOfficeStyleSheet theme = themeDocument.getTheme();
CTBaseStyles themeElements = theme.getThemeElements();
CTFontScheme fontScheme = themeElements.getFontScheme();
CTFontCollection minorFont = fontScheme.getMinorFont();
CTTextFont latin = minorFont.getLatin();
...
Unfortunately there is no API documentation of org.openxmlformats.schemas.* public available. So, to get a such, we need downloading sources of ooxml-schemas (for example from https://repo1.maven.org/maven2/org/apache/poi/ooxml-schemas/1.4/) and then using javadoc to create a API documentation from the sources.
Complete example:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.openxml4j.opc.*;
import org.openxmlformats.schemas.drawingml.x2006.main.*;
public class WordGetThemeDocument {
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("./WordExample.docx"));
OPCPackage oPCPackage = document.getPackage();
PackagePartName partName = PackagingURIHelper.createPartName("/word/theme/theme1.xml");
PackagePart themePart = oPCPackage.getPart(partName);
System.out.println(themePart);
ThemeDocument themeDocument = ThemeDocument.Factory.parse(themePart.getInputStream());
CTOfficeStyleSheet theme = themeDocument.getTheme();
CTBaseStyles themeElements = theme.getThemeElements();
CTFontScheme fontScheme = themeElements.getFontScheme();
CTFontCollection minorFont = fontScheme.getMinorFont();
CTTextFont latin = minorFont.getLatin();
System.out.println(latin);
String typeFace = latin.getTypeface();
System.out.println(typeFace);
document.close();
}
}
I have a pdf template and with the following code I open it, edit, and then save it with another name after flattening it. But when I open the new pdf file, the fields are still editable.
public static void main(String[] args) throws IOException {
PDDocument doc = PDDocument.load(new File("template.pdf"));
PDDocumentCatalog docCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
for ( PDField field : acroForm.getFields()) {
if (field.getFieldType().equals("Tx")) {
field.setValue(field.getPartialName());
}
System.out.println(field.getFieldType());
}
acroForm.flatten();
doc.save("finalFile.pdf");
doc.close();
}
I read other questions about flattening but no one has my problem.
Am I missing anything?
I'm on PDFBox 2.0.12
I want to verify PDF Document using TestNG and PDFBox.
I would ask, is PDF able to check contains text like this:
PDFParser parser = new PDFParser(stream);
parser.getDocument().conntains("ABC")
Try below code:-
public void ReadPDF() throws Exception {
URL TestURL = new URL("http://www.axmag.com/download/pdfurl-guide.pdf");
BufferedInputStream TestFile = new BufferedInputStream(TestURL.openStream());
PDFParser TestPDF = new PDFParser(TestFile);
TestPDF.parse();
String TestText = new PDFTextStripper().getText(TestPDF.getPDDocument());
Assert.assertTrue(TestText.contains("Open the setting.xml, you can see it is like this"));
}
Download libraries :- https://pdfbox.apache.org/index.html