reading text along with formatting from excel cell using apache poi - java

I am trying to read ms-excel sheet cell value which contains RichText. Example below.
Welcome to apache-poi world.
Please use latest version of poi.
As you can see the word contains bold and italic word and new paragraph. I want to read this text along with style format, So that i can display in UI the way i have entered.

Related

Java Copy Excel to Word without losing font style

I have an excel file with a lot of sheets. I need to transfer some of that information to a word file. If this were all, there wouldn't be a problem. Poi api offers me everything what I need for that task. The problem comes when I need to copy that information from an excel cell to a word file without losing the font style since every cell has specific or multiples font color and I also need to preserve that info.
I know that Poi provides you Cell.getCellStyle() method so you can save your cell style but this is useful only if you want to copy an excel file to another one, but not for my case.
Do you know how to make what I need or if is an impossible task? May be I am using the wrong API.
POI can do it for sure. You need set the font in your new word object.
Like:
Font font = wb.createFont();
font.setFontName("xxx");
CellStyle cellStyle = wb.createCellStyle();
cellStyle.setFont(font);
And if you like to apply multi font in one cell, here are the core code
//Set one cell with different font style
HSSFRichTextString textString = new HSSFRichTextString(fileHead);
textString.applyFont(0,fileHead.indexOf("("), font);
textString.applyFont(fileHead.indexOf("("),fileHead.length(), font3);
cell.setCellValue(textString);
Hope helpful.

Apache POI - PowerPoint Bar and Line Chart

I'm trying to make a bar and line chart using Apache POI in Powerpoint like this:
I found a solution for
excel to the problem, but it creates the chart in Excel instead of PowerPoint, my question now is, is there a way to convert the code in the other solution to one that works for PowerPoint, so I can add it to a slide. And have the input as lists instead of rows/columns in the excel file?

How to replace some text in textbox of docx document using Apache POI?

I'm able to get text of textboxes using code described in answer of How to get text from textbox of MS word document using Apache POI?
But i found no way to edit text in a textbox for example to replace some placeholders. Iterate over runs of embeddedPara and setText() does not work. Im using apache poi 3.13.

How to read inline text boxes from doc file using poi

I can read textboxes with anchors directly from document and table streams as mentioned in microsoft office format specifications.
But I am not getting idea about reading inline textboxes.
Please suggest an idea..
while reading paragraph with textbox I am getting a field character at the beginning of textbox. Please provide any code if you already have it.

Apache POI :- Get Headings from DOC file

I am messing with apache poi to manipulate word document. Is there any way to get headings from a doc file? i am able to get plain text from the doc but I need to differentiate all headings from the document file?. IS any function available in apache poi api to get only headings from the ms word file??
Promoting a comment to an answer
There are two ways to make a "Heading" in Word. The "proper" way, and the way that most people seem to do it...
In the styles dropdown, pick the appropriate header style, write your text, then go back to the normal paragraph style for the next line
Highlight a line, and bump up the font size + make it bold or italic
If your users are doing #2, you've basically no real hope of identifying the Headings. Short of writing some fuzzy matching logic to try to spot when the font size jumps, you're out of luck
For #1, it's fairly easy in Apache POI. What you'll want to do is grab the style description of the style that applies to a paragraph, then get the name of the style. If that starts with Heading (case insensitive), you know you've found a heading. Get the text of that paragraph, and move on through the document.
If you look at the Apache Tika MS-Word parser which is built on top of POI, you'll see a good example there of iterating over the paragraphs and checking the styles
just as Gagravarr saying:
For #1, it's fairly easy in Apache POI. What you'll want to do is grab the style description of the style that applies to a paragraph, then get the name of the style. If that starts with Heading (case insensitive), you know you've found a heading. Get the text of that paragraph, and move on through the document.
using Apache POI code like this :
File f=new File("test.docx");
FileInputStream fis = new FileInputStream(f);
XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
XWPFStyles styles=xdoc.getStyles();
List<XWPFParagraph> xwpfparagraphs =xdoc.getParagraphs();
System.out.println();
for(int i=0;i<xwpfparagraphs.size();i++)
{
System.out.println("paragraph style id "+(i+1)+":"+xwpfparagraphs.get(i).getStyleID());
if(xwpfparagraphs.get(i).getStyleID()!=null)
{
String styleid=xwpfparagraphs.get(i).getStyleID();
XWPFStyle style=styles.getStyle(styleid);
if(style!=null)
{
System.out.println("Style name:"+style.getName());
if(style.getName().startsWith("heading"))
{
//this is a heading
}
}
}
}
At least for HWPF (i.e. the old binary doc format) and if you have a properly formatted file (so type #1 of the other answers) you should not rely exclusively on the style name - in fact, this may be a language-dependent value ("Heading" in English, "Titre" in French, etc.).
Paragraph.getLvl(), which encodes the level where the respective paragraph is shown in Word's outline view, often makes a good secondary source. 1 constitutes the most significant level, all subsequent numbers up to 8 stand for less significant heading candidates and 9 is the value that Word assigns to ordinary (non-heading) paragraphs by default.

Categories

Resources