I have a word document (.docx) that contains some information, I want to edit this document and add to it some text , I want that the text still invisible when I open the document but I want also to access to it easily from my code. Do you have please any idea how can I proceed ?
I was in fact looking for a solution with ApachePOI to make my text invisible in the generated word document.After some researches, I found this solution:
for (XWPFParagraph paragraph : doc.getParagraphs()){
for(XWPFRun run : paragraph.getRuns()){
CTOnOff onoffnull = CTOnOff.Factory.newInstance();
run.getCTR().getRPr().setVanish(onoffnull);
}
}
this code make all paragraphs of a word document invisible by the user.
Related
How do i numbering of pages in word file by Java.
I am using Apache POI driver to interact JAVA and word .
i want border and page number as well in my word file while i am creating file from JAVA.
Please help.
The question marked as a duplicate has a complex answer to a relatively simple question.
The simple answer (for page number) is very similar to this answer: https://stackoverflow.com/a/40264237/2296441. The difference is just which field to insert. The afore mentioned answer shows how to insert a TOC field. In your case you want a PAGE field.
XWPFParagraph p;
...
// get or create your paragraph
....
CTP ctP = p.getCTP();
CTSimpleField page = ctP.addNewFldSimple();
page.setInstr("PAGE");
page.setDirty(STOnOff.TRUE);
Note: setDirty tells Word to update the field which causes a dialog to be opened when the document is opened. This dialog is MS Word making sure you want to update the field. I don't think you can disable the dialog and still have the field calculated on open.
To set page borders you are once again going to have to break into the CT classes. In this case the appropriate location in the document is the section properties. Here is how to set a double line border around the whole page set back 24 points from the page edge.
// Page Borders
CTDocument1 ctDoc = doc.getDocument();
CTBody ctBody = ctDoc.getBody();
CTSectPr ctSectPr = ctBody.isSetSectPr() ? ctBody.getSectPr() : ctBody.addNewSectPr();
CTPageBorders ctPgBorders = ctSectPr.isSetPgBorders() ? ctSectPr.getPgBorders() : ctSectPr.addNewPgBorders();
ctPgBorders.setOffsetFrom(STPageBorderOffset.PAGE);
CTBorder ctBorder = CTBorder.Factory.newInstance();
ctBorder.setVal(STBorder.DOUBLE);
ctBorder.setSpace(new BigInteger("24"));
ctPgBorders.setTop(ctBorder);
ctPgBorders.setBottom(ctBorder);
ctPgBorders.setRight(ctBorder);
ctPgBorders.setLeft(ctBorder);
Disclaimer
The MS-Word functionality in POI is still largely unfinished, and subject to change.
Trying to retrieve the text of the article. I want to select all of the text within
<p>... </p>
I was able to do that.
But I only want to retrieve the text from the article body, not the entire page
Document article = Jsoup.connect("html doc").get();
Elements paragraphs = article.select("p");
The code above gets the entire text from the page. I just want the text between
<article itemprop= "articleBody">...</article>
I'm sorry if this was hard to understand, I tried to formulate the
questions as best I could.
Elements#text() will return text-only content of all the combined paragraphs (see here for more details https://jsoup.org/apidocs/org/jsoup/select/Elements.html)
Try selecting on the itemprop attribute
for (Element paragraph : doc.select("article[itemprop=articleBody]"))
System.out.println(paragraph.text());
See CSS Selectors for more tips
I was hoping to get some help in how I should approach a program I have attempted to write a few times now.
I have a number of folders. In each folder, there is a HTML file, and a .txt file which contains text in the HTML file, stripped of all HTML tags.
As an example, a simplified HTML file may be
<html><head></head><body><p>This is some <b>text</b></p><p>Please ignore me</p></body></html>
And within a .txt in the same folder, I have "This is some text".
From these two files, I would like to create a new file which is a HTML with a box drawn around "This is some text", like so :
The obvious problem here is that the pretty-printed text files do not contain any mark-up, and so finding it within the HTML document is difficult.
My idea thus far has been :
-Save the .txt contents in a variable.
-Grab the HTML contents, strip of all HTML tags :
public static String html2text(String html) {
return Jsoup.parse(html).text();
}
I'm unsure how to proceed from this point. I mean...I could try to add a div with a class surrounding the text, and then add a border style to this...but how do I find the sub-string in the HTML reliably, retaining all of the markup within the HTML ?
I'm sure there is a simple way to do this and I am just overthinking it, I would usually have a chat with a friend about this and solve it but everyone seems to be offline - so I come to you for guidance here.
Can anyone offer any feedback please? Thanks.
This should work for you:
More information on selectors and setting attribute values
private void test(){
//replace with your stored variables
String html = "<html><head></head><body><p>This is some <b>text</b></p><p>Please ignore me</p></body></html>";
String txt = "This is some text";
Document doc = Jsoup.parse(html);
String query = "p:contains(" + txt + ")";
Elements htmlTxt = doc.select(query); //selects all the paragraph elements with your target txt
//Loop through each element and add a red border around it
for(Element e : htmlTxt){
System.out.println("e: " + e.toString());
e.attr("style", "border:3px; border-style:solid; border-color:#FF0000; padding: 1em;");
}
}
I have a word document with .docx extension. I want add header with some information and footer with page number in each page.
I don not know how add header and footer on word document.
I am using Docx4j open source edition with Java.
Start by looking at samples/HeaderFooterCreate.java
Basically you create the header and footer parts, and add them as rels of the MainDocumentPart. Then you reference these rels appropriately from the sectPr element.
For the actual content of your header/footer parts, I'd suggest you create a docx containing what you want in Word, then use the docx4j webapp or Helper AddIn to generate corresponding code.
I'm generating a docx file using Apache POI 3.13 and I stuck with headers/footers for first page.
I create XMPFParagraph[] without any problem. Next I create headers and footers like this (I've tried in different oreder):
policy.createHeader(XWPFHeaderFooterPolicy.DEFAULT, defaultHeader);
policy.createFooter(XWPFHeaderFooterPolicy.DEFAULT, defaultFooter);
policy.createHeader(XWPFHeaderFooterPolicy.FIRST, firstHeader);
policy.createFooter(XWPFHeaderFooterPolicy.FIRST, firstFooter);
Once I generate my docx file I could see my default header/footer on every page including first one. But if I select to use different header/footer for the first page - my first header and footer apperes correctly.
How could I make this happens automaticaly via code? And is there any appropriate documentation with examples about POI?
If you want to set a first page header in a section, you must enter a title page tag in section properties tag (w:sectPr). The title page tag can be empty, but it is necessary. In your case, you can add only 2 code lines:
CTSectPr sect = document.getDocument().getBody().getSectPr();
sect.addNewTitlePg();
`Best regards!