get the number of pages in a word document using aspose

get the number of pages in a word document using aspose - java

how can i get the number of pages in a word document(.doc or .docx) using Aspose java?
or maybe get the number of pages in a word document in java without using Aspose.

You can use Document.getPageCount method to get the page count of a doc / docx file in Aspose.Words for Java. Following is the sample code:
//Open the Word Document
Document doc = new Document("C:\\Data\\Image2.doc");
//Get page count
int pageCount = doc.getPageCount();
//Print Page Count
System.out.println(pageCount);
Hope this helps.

To open from a stream simply pass a stream object that contains a document to the Document constructor. The code sample below shows how to open a document from a stream and get number of Pages.
String dataDir = "D:\\Temp\\";
String filename = "input.docx";
InputStream in = new FileInputStream(dataDir + filename);
Document doc = new Document(in);
System.out.println("Document opened. Total pages are " + doc.getPageCount());
in.close();
I work with Aspose as Developer Evangelist.

Related

how to read data from multiple HTML files and populated to single docx/pdf using Aspose?

Need to read data from each html file and add the data as one section in the docx document and the same should be applies for multiple html files adding each html data as each section in the single document

You can use Document.appendDocument method. In this case each appended document will be added to the document as a separate Section node (if there is only one section in the source document). For example:
// List of input Html documents.
String[] files = new String[]{"C:\\temp\\in1.html", "C:\\temp\\in2.html", "C:\\temp\\in3.html"};
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.write("This is the main document where HTML documents will be appended");
// Append HTML documents.
for(String path : files)
{
Document subDoc = new Document(path);
doc.appendDocument(subDoc, ImportFormatMode.USE_DESTINATION_STYLES);
}
doc.save("C:\\Temp\\out.docx");
If you are using DocumentBuilder.insertHtml method to insert HTML, you should use DocumentBuilder.insertBreak to insert section break between the inserted HTML parts:
// List of input Html documents.
String[] files = new String[]{"C:\\temp\\in1.html", "C:\\temp\\in2.html", "C:\\temp\\in3.html"};
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.write("This is the main document where HTML documents will be appended");
// Append HTML documents.
for(String path : files)
{
// Insert section break
builder.insertBreak(BreakType.SECTION_BREAK_NEW_PAGE);
// Insert HTML
builder.insertHtml(Files.readString(Path.of(path)));
}
doc.save("C:\\Temp\\out.docx");

Printing different values (Copy number) in different copies of a particular document

My application uses an RTF file with merge fields as source and creates a PDF file with it using Aspose.Words. The users of this application give that resulting document to their clients, so copies of same document will be printed for each of their client. There is only one difference on those copies however, and that is copy number at the end of each document copy.
For now; lets say there are 4 clients so 4 copies of the same document will be printed with only copy numbers different. I achieve this by creating same document for 4 times and each time I insert my html text, merge fields, and add copy number then append the documents. In the end, I have one big document in which all 4 created documents appended.
Here is my code block for it, there were lots of code there, so I tried to downsize them to only related parts:
import com.aspose.words.*
Document docAllAppended = new Document(loadDocument("/documents/" + RTFFileName));
Document docTemp=null;
for (int i = 1; i <= copyNumber; i++) {
docTemp = new Document(loadDocument("/documents/" + RTFFileName));
DocumentBuilder builder = new DocumentBuilder(docTemp);
//insert html which includes file context
builder.insertHtml(htmlText);
//insert Copy number
builder.moveToBookmark("sayfa");
Font font = builder.getFont();
font.setBold(true);
font.setSize(8);
builder.write("Copy Number-" + i+ " / ");
font.setBold(false);
docAllAppended.appendDocument(docTemp,ImportFormatMode.USE_DESTINATION_STYLES);
}
This looks so unnecessary and has low performance. Also each time my users try to change copy number to be printed, my application calculates whole thing from the start. What I am asking is, is there a way to make this faster or how not to create whole thing again when copy number to be printed changes? So far I haven't found much.
Thanks in advance.

If the only difference is the copy number, then you can just prepare the document once by inserting HTML, merging etc.
Then, in a for loop, set the copy number and save the document as docx or pdf. Appending the document in the loop is not necessary, you can save each copy as different name.
import com.aspose.words.*
Document docAllAppended = new Document(loadDocument("/documents/" + RTFFileName));
Document docTemp=null;
docTemp = new Document(loadDocument("/documents/" + RTFFileName));
DocumentBuilder builder = new DocumentBuilder(docTemp);
//insert html which includes file context
builder.insertHtml(htmlText);
// In for loop, only update the copy number
for (int i = 1; i <= copyNumber; i++) {
// Use DocumentBuilder for font setting
builder.moveToBookmark("sayfa");
Font font = builder.getFont();
font.setBold(true);
font.setSize(8);
builder.write("dummy value");
font.setBold(false);
// Use Bookmark for setting the actual value
Bookmark bookmark = docAllAppended.getRange().getBookmarks().get("sayfa");
bookmark.setText("Copy Number-" + i + " / ");
// Save the document for each client
docAllAppended.save(Common.DATA_DIR + "Letter-Client-" + i + ".docx");
}
I work with Aspose as Developer Evangelist.

Converting a pdf to word document using java

I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.
Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?
import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;
public class JpegToPDF {
public static void main(String[] args) {
try {
Document convertJpgToPdf = new Document();
PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
"c:\\java\\ConvertImagetoPDF.pdf"));
convertJpgToPdf.open();
Image convertJpg = Image.getInstance("c:\\java\\test.jpg");
convertJpgToPdf.add(convertJpg);
convertJpgToPdf.close();
System.out.println("Successfully Converted JPG to PDF in iText");
} catch (Exception i1) {
i1.printStackTrace();
}
}
}

In fact, you need two libraries. Both libraries are open source. The first one is iText, it is used to extract the text from a PDF file. The second one is POI, it is ued to create the word document.
The code is quite simple:
//Create the word document
XWPFDocument doc = new XWPFDocument();
// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
// Extract the text
String text=strategy.getResultantText();
// Create a new paragraph in the word document, adding the extracted text
XWPFParagraph p = doc.createParagraph();
XWPFRun run = p.createRun();
run.setText(text);
// Adding a page break
run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();
Beware: With the used extraction strategy, you will lose all formatting. But you can fix this, by inserting your own, more complex extraction strategy.

You can use 7-pdf library
have a look at this it may help :
http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html
PS: itext has some issues when given file is non RGB image, try this out!!

Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port; it's possible to use that to convert documents. If this looks like an acceptable solution, JODConverter can help you.

get thumbnail of word in java using Apache POI

I study on a web sharing project in jsf.In this project users can upload documents such as .doc,.pdf,.ppt,..etc . I want show this documents first pages as a thumbnail. After some googling around I found Apache POI.Can anybody has any suggestion for my problem? How can I return thumbnail image of word doc's first page? I try this code.This code just get first picture that word doc contains:
POIFSFileSystem fs = new POIFSFileSystem(new FileInputStream("d:\\test.doc"));
HWPFDocument doc = new HWPFDocument(fs);
PicturesTable pt=doc.getPicturesTable();
List<Picture> p=pt.getAllPictures();
BufferedImage image=ImageIO.read(new ByteArrayInputStream(p.get(0).getContent()));
ImageIO.write(image, "JPG", new File("d:\\test.jpg"));

What's you are doing make nothing. HWPFDocument can extract thumbnail embedded in document (when saving files, check on 'add preview' option). So HWPFDocument can extract only thumbnail of documents having thumbnail.
Even, to do that, you have to make:
{code}
static byte[] process(File docFile) throws Exception {
final HWPFDocumentCore wordDocument = AbstractWordUtils.loadDoc(docFile);
SummaryInformation summaryInformation = wordDocument.getSummaryInformation();
System.out.println(summaryInformation.getAuthor());
System.out.println(summaryInformation.getApplicationName() + ":" + summaryInformation.getTitle());
Thumbnail thumbnail = new Thumbnail(summaryInformation.getThumbnail());
System.out.println(thumbnail.getClipboardFormat());
System.out.println(thumbnail.getClipboardFormatTag());
return thumbnail.getThumbnailAsWMF();
}
{code}
after that, you have to probably convert WMF file format to more common format (jpeg, png...). ImageMagick can help.

Remove page from PDF

I'm currently using iText and I'm wondering if there is a way to delete a page from a PDF file?
I have opened it up with a reader etc., and I want to remove a page before it is then saved back to a new file; how can I do that?

The 'better' way to 'delete' pages is doing
reader.selectPages("1-5,10-12");
Which means we only select pages 1-5, 10-12 effectively 'deleting' pages 6-9.

Get the reader of existing pdf file by
PdfReader pdfReader = new PdfReader("source pdf file path");
Now update the reader by
pdfReader.selectPages("1-5,15-20");
then get the pdf stamper object to write the changes into a file by
PdfStamper pdfStamper = new PdfStamper(pdfReader,
new FileOutputStream("destination pdf file path"));
close the PdfStamper by
pdfStamper.close();
It will close the PdfReader too.
Cheers.....

For iText 7 I found this example:
PdfReader pdfReader = new PdfReader(PATH + name + ".pdf");
PdfDocument srcDoc = new PdfDocument(pdfReader);
PdfDocument resultDoc = new PdfDocument(new PdfWriter(PATH + name + "_cut.pdf"));
resultDoc.initializeOutlines();
srcDoc.copyPagesTo(1, 2, resultDoc);
resultDoc.close();
srcDoc.close();
See also here: clone-reordering-pages
and here: clone-splitting-pdf-file

You can use a PdfStamper in combination with PdfCopy.
In this answer it is explained how to copy a whole document. If you change the criteria for the loop in the sample code you can remove the pages you don't need.

Here is a removing function ready for real life usage. Proven to work ok with itext 2.1.7. It does not use "strigly typing" also.
/**
* Removes given pages from a document.
* #param reader document
* #param pagesToRemove pages to remove; 1-based
*/
public static void removePages(PdfReader reader, int... pagesToRemove) {
int pagesTotal = reader.getNumberOfPages();
List<Integer> allPages = new ArrayList<>(pagesTotal);
for (int i = 1; i <= pagesTotal; i++) {
allPages.add(i);
}
for (int page : pagesToRemove) {
allPages.remove(new Integer(page));
}
reader.selectPages(allPages);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

get the number of pages in a word document using aspose - java

how can i get the number of pages in a word document(.doc or .docx) using Aspose java? or maybe get the number of pages in a word document in java without using Aspose.

Related

how to read data from multiple HTML files and populated to single docx/pdf using Aspose?

Printing different values (Copy number) in different copies of a particular document

Converting a pdf to word document using java

get thumbnail of word in java using Apache POI

Remove page from PDF

Categories

Resources