I am developing an application where i am using POI library to generate .docx files.
By using XWPFTable I am unable to apply table styles. can any one worked on this part? There are no examples and not good documentation out there.
Here is my snippet.
int nRows = 14;
int nCols = 6;
XWPFTable t1 = doc.createTable(nRows, nCols);
t1.setStyleID("Table Grid");
Thanks in advance
I have stumbled on this issue. I have created an empty docx file with all my juicy style available (Heading 1, 2, etc...). I create a XWPFDocument
try {
InputStream resourceAsStream = new FileInputStream("protocol_empty.docx");
document = new XWPFDocument(resourceAsStream);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
then add my paragraph my setting the style ....setStyle("Heading 1");
It works.
Got the answer. I added a template with few styles in it. It worked.
This question helped me
I had trouble identifying the styleId to use. If you make a template and add the table with the style in, export it as a Word XML file then you can look up styleId. So "Light List" for me was actually "LightList" (w:style w:type="table" w:styleId="LightList").
Related
I wrote the following code in Netbeans editor in Java UI application for generating PDF using itext.
String result="";
Document doc=new Document();
PdfWriter writer=null;
DecimalFormat df=new DecimalFormat("0.00");
try
{
writer=PdfWriter.getInstance(doc, new FileOutputStream(new File("second.pdf")));
doc.addAuthor("abcdefg");
doc.addCreationDate();
doc.addProducer();
doc.addCreator("abcdefg");
doc.addTitle("Invoice for company");
doc.setPageSize(PageSize.A4);
doc.open();
doc.newPage();
doc.add(new Paragraph("new paragraph"));
PdfPTable table = new PdfPTable(1);
PdfPCell cellValue = new PdfPCell(new Phrase("Header 1"));
cellValue.setColspan(1);
table.addCell(cellValue);
doc.add(table);
result= "Successfully created preview.Please check the document";
}
catch(Exception e)
{
result= "Doucment already opened. Please close it";
System.out.println("exception came");
}
finally
{
if(doc!=null)
{
doc.close();
}
if(writer!=null)
{
writer.close();
}
return result;
}
This same piece of code is generating the pdf when I hit the RUN button in NetBeans but if double click the ".JAR" file in the project folder it is not able to add table to the pdf and generating the exception IOException Document has no pages.
There are two interesting things happening here :
1) When I remove the table adding part from my code, it is working very fine in in both the cases. This proves that "build" in netbeans is happening properly.
2) When I add the table part. The exception is generated but it is not executing the catch block and finally block. I am saying this because if catch and finally blocks are executed properly DOCUMENT will be definitely closed properly. But here when I double click the document when my app generated the file, it is showing that the FILE IS ALREADY IN USE AND IT IS DAMAGED FILE. This proves that the catch and finally blocks are not executing.
I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.
Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?
import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
public class JpegToPDF {
public static void main(String[] args) {
try {
Document convertJpgToPdf = new Document();
PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
"c:\\java\\ConvertImagetoPDF.pdf"));
convertJpgToPdf.open();
Image convertJpg = Image.getInstance("c:\\java\\test.jpg");
convertJpgToPdf.add(convertJpg);
convertJpgToPdf.close();
System.out.println("Successfully Converted JPG to PDF in iText");
} catch (Exception i1) {
i1.printStackTrace();
}
}
}
In fact, you need two libraries. Both libraries are open source. The first one is iText, it is used to extract the text from a PDF file. The second one is POI, it is ued to create the word document.
The code is quite simple:
//Create the word document
XWPFDocument doc = new XWPFDocument();
// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
// Extract the text
String text=strategy.getResultantText();
// Create a new paragraph in the word document, adding the extracted text
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
// Adding a page break
run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();
Beware: With the used extraction strategy, you will lose all formatting. But you can fix this, by inserting your own, more complex extraction strategy.
You can use 7-pdf library
have a look at this it may help :
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
PS: itext has some issues when given file is non RGB image, try this out!!
Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port; it's possible to use that to convert documents. If this looks like an acceptable solution, JODConverter can help you.
What I want is:
I have PDF document in english and an other languages and always pairs with the same content. I want to merge exactly the pairs, but I got tons of them (but the pairs have quiet similiar names), so I don't want to do it manually.
What I do have:
some experience with Java and VBA and yet installed Microsoft Office Suite XP and Adobe Acrobat Professional 8. Any other software has to be for free... if needed.
Can you help me to find a solution. Anything would help I only found solutions for newer versions of Excel and Acrobat Professional on the web.
This is easily accomplished using IText, which is an open source package that processes Pdfs. There are Java and C# versions.
Here is some example java code that I wrote to merge several pdfs
/**
* merge multiple pdfs into a single pdf
* #param fileName output pdf full path name
* #parrm childPdfs full path names of pdfs to merge
*/
public static void mergePdfs(String fileName, String [] childPdfs) {
try {
Document doc = new Document();
PdfCopy copyDoc = new PdfCopy(doc, new FileOutputStream(fileName));
doc.open();
for (int i = 0; i < childPdfs.length; i++) {
PdfReader reader = new PdfReader(childPdfs [i]);
int pageCnt = reader.getNumberOfPages();
for (int j = 1; j <= pageCnt; j++) {
copyDoc.addPage(copyDoc.getImportedPage(reader, j));
}
reader.close();
}
doc.close();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
I'm writing a java code that utilizes Apache-poi to read ms-office .doc file and itext jar API's to create and write into pdf file. I have done reading texts and tables printed in the .doc file. Now i'm looking for a solution that reads images written in the document. I have coded as following to read images in the document file. Why this code is not working.
public static void main(String[] args) {
POIFSFileSystem fs = null;
Document document = new Document();
WordExtractor extractor = null ;
try {
fs = new POIFSFileSystem(new FileInputStream("C:\\DATASTORE\\tableandImage.doc"));
HWPFDocument hdocument=new HWPFDocument(fs);
extractor = new WordExtractor(hdocument);
OutputStream fileOutput = new FileOutputStream(new File("C:/DATASTORE/tableandImage.pdf"));
PdfWriter.getInstance(document, fileOutput);
document.open();
Range range=hdocument.getRange();
String readText=null;
PdfPTable createTable;
CharacterRun run;
PicturesTable picture;
for(int i=0;i<range.numParagraphs();i++) {
Paragraph par = range.getParagraph(i);
readText=par.text();
if(!par.isInTable()) {
if(readText.endsWith("\n")) {
readText=readText+"\n";
document.add(new com.itextpdf.text.Paragraph(readText));
} if(readText.endsWith("\r")) {
readText += "\n";
document.add(new com.itextpdf.text.Paragraph(readText));
}
run =range.getCharacterRun(i);
picture=hdocument.getPicturesTable();
if(picture.hasPicture(run)) {
//if(run.isSpecialCharacter()) {
Picture pic=picture.extractPicture(run, true);
byte[] picturearray=pic.getContent();
com.itextpdf.text.Image image=com.itextpdf.text.Image.getInstance(picturearray);
document.add(image);
}
} else if (par.isInTable()) {
Table table = range.getTable(par);
TableRow tRow1= table.getRow(0);
int numColumns=tRow1.numCells();
createTable=new PdfPTable(numColumns);
for (int rowId=0;rowId<table.numRows();rowId++) {
TableRow tRow = table.getRow(rowId);
for (int cellId=0;cellId<tRow.numCells();cellId++) {
TableCell tCell = tRow.getCell(cellId);
PdfPCell c1 = new PdfPCell(new Phrase(tCell.text()));
createTable.addCell(c1);
}
}
document.add(createTable);
}
}
}catch(IOException e) {
System.out.println("IO Exception");
e.printStackTrace();
}
catch(Exception exep) {
exep.printStackTrace();
}finally {
document.close();
}
}
The problems are:
1. Condition if(picture.hasPicture(run)) is not satisfying but document has jpeg image.
I'm getting following exception while reading table.
java.lang.IllegalArgumentException: This paragraph is not the first one in the table
at org.apache.poi.hwpf.usermodel.Range.getTable(Range.java:876)
at pagecode.ReadDocxOrDocFile.main(ReadDocxOrDocFile.java:113)
Can anybody help me to solve the problem.
Thank you.
Regarding your exception:
Your code iterates over all paragraphs and calls isInTable() for each one of them. Since tables are commonly composed of several such paragraphs, your call to getTable() also gets executed several times for a single table.
However, what your code should do instead is to find the first paragraph of a table, then process all paragraphs therein (via getRow(m).getCell(n)) and ultimately continue with the outer loop in the first paragraph after the table. Codewise this may look roughly like the following (assuming no merged cells, no nested tables and no other funny edge cases):
if (par.isInTable()) {
Table table = range.getTable(par);
for (int rn=0; rn<table.numRows(); rn++) {
TableRow row = table.getRow(rn);
for (int cn=0; cn<row.numCells(); cn++) {
TableCell cell = row.getCell(cn);
for (int pn=0; pn<cell.numParagraphs(); pn++) {
Paragraph cellParagraph = cell.getParagraph(pn);
// your PDF conversion code goes here
}
}
}
i += table.numParagraphs()-1; // skip the already processed (table-)paragraphs in the outer loop
}
Regarding the pictures issue:
Am I guessing right that you are trying to obtain the picture which is anchored within a given paragraph? Unfortunately, the predefined methods of POI only work if the picture is not embedded within a field (which is rather rare, actually). For field-based images (i.e. preview images of embedded OLEs) you should do something like the following (untested!):
PictureStore pictureStore = new PictureStore(hdocument);
// bla bla ...
for (int cr=0; cr < par.numCharacterRuns(); cr++) {
CharacterRun characterRun = par.getCharacterRun(cr);
Field field = hdocument.getFields().getFieldByStartOffset(FieldsDocumentPart.MAIN, characterRun.getStartOffset());
if (field != null && field.getType() == 0x3A) { // 0x3A is type "EMBED"
Picture pic = pictureStore.getPicture(field.secondSubrange(characterRun));
}
}
For a list of possible values of Field.getType() see here.
I am working with PDFBOX and the documentation on it seems sparse so I've come here for some help. I am trying to print out a pdf form that I've created, with fields populated dynamically by eclipse. I can get it to import and print, but when I do print, the fields I've set don't show up (although they do when I save it to HDD). Can someone point me to the settings to set visible when printing? I saw itext had something similar, and I'm hoping that PDFBox does too.
Here is my current code.
PDDocument doc = null;
try{
doc = PDDocument.load("resources/orderForm.pdf");
PDDocumentCatalog docCatalog = doc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDField field = acroForm.getField("Orderer");
field.setValue("JohnTest");
} catch (IOException ie){
System.out.println(ie);
}
//doc.addPage(new PDPage());
try{
//doc.save("Empty PDF.pdf");
doc.silentPrint();
//doc.print();
doc.close();
} catch (Exception io){
System.out.println(io);
}
}
found my answer, can't use pdfbox to do it, although the alternative is just as simple. Use the desktop to print the file! example code as follows
public void printOrder(){
try {
File myFile = new File(finished);
//Desktop.getDesktop().open(myFile);
Desktop.getDesktop().print(myFile);
doc.close();
} catch (IOException ex) {
// no application registered for PDFs
}
}