Replacing a text in Apache POI XWPF not working Extension - java

I am using the last answer which is available in link:
Replacing a text in Apache POI XWPF not working.
Thanks to Josh.
It is working perfect for almost all scenarios, but sometimes it is not applying the color to the replaced text properly.
am I missing something?

Runs are funny things. I know that the solution in this Stack Overflow question works great to replace sections of paragraphs or parts of runs that have different formatting (bold, embossed, etc) scattered throughout a given paragraph. For my particular use-case, the replace function was able to replace strings mid-run and handle any particular formatting that we were encountering. I didn't personally look at the color, but it appears to have functionality to do so: newRun.setColor(run.getColor());
Note that I originally was using Apache POI 3.11 and the code was giving me a lot of errors like "The method isEmbossed() is undefined for the type XWPFRun". Upgrading to 3.15 solved this.

Related

How can i change text orientation in XWPFDocument?

I am working right now in producing word documents in java and i am using XWPFDocument of POI Apache. The final document must looks like this
http://sk.uploads.im/t/rtwvm.png
till no everything works fine, I created table, managed to merge cells but i can not find a way to change the text orientation in table cells. I simply want "Type 1" to be upward.
I only found a solution using cellStyle which seems to work only in excel and not in word, which i am using.
You probably need to create two documents in Word, one with the normal orientation and one with the changed one, then unzip them (.docx is actually a Zip-File) and analyze which xml-structure is responsible for this.
Then you can check if POI already offers higher level APIs for these or if you need to access the low-level POI classes via the getCTxxx() methods, e.g. XWPFTableCell.getCTTc() returns the underlying XML structure and allows you to do things that are not possible via the normal POI interfaces.
You can use something like: cell.getCTTc().getTcPr().addNewTextDirection().setVal(STTextDirection.BT_LR),
where the parameters are found in: STTextDirection.
The problem I couldn't solve yet is that the row height does not update automatically to the vertical text length, then the text is not completely showed. If you solve it, please, post here.

Getting paragraph styles in apache POI, language specific

Is it possible to get the styles of a paragraph in a particular langage ?. For example: on my personal computer I happen to have a dutch installation of microsoft windows. this is resulting in the paragraph.getStyles() method returning the dutch values of the styles, instead of a normal value of "heading1", "heading2" etc I am receiving values such as"Kop1", "kop2".
I am creating a parser for word based documents which selects certain parts on style. does anyone have any experience with this ?
I would take a look at the data in the .docx file (it's a zip-file) to verify if the data is written this way by Word already or "transposed" by POI or some local functionality.
If the data is already written by Word you will need to check how you can create the document in a different language in Word.
If not, then if you are using POI 3.13 or newer, you can try to set a different locale via LocaleUtil.setUserLocale() and see if that affects the results.

How do I extract data from pptx file using Apache POI?

I am using XSLFPowerPointExtractor to extract text from a pptx file. However all the text in the pptx file is returned to me in a single string. Is there anyway i can get the text on each slide separately? I am completely new to this concept, so please give detailed answers..
I looked up the API documentation and it seems that it's either all or nothing. The API documentation has a method called getText() which returns the entire text for all the slides which is exactly the behavior you are observing.
A bit more googling showed me that the way to do it is to use another API namely XMLSlideShow. That gives you a slide-by-slide access to the presentation.
From there, you can access the different shapes including the text areas from which you can read the text. As a matter of fact, this is explained in this other SO question which I believe will help you resolve your issue: How to get pptx slide notes text using apache poi?

Java Apache POI read Word (.doc) file and get named CHARACTER styles used

this follows on from here:
Java Apache POI read Word (.doc) file and get named styles used
at the time (10/2012) there was a solution to finding paragraph styles but not character styles.
And yet... if you use LibreOffice Writer to open a Word doc, for example, it does translate styles and highlighting from .doc to .odt ... so someone somewhere appears to have cracked this...
I don't know whether the Apache POI team and the LibreOffice/OpenOffice teams are in any way related, but I'd have thought the Apache POI team would've been able to get this functionality from the LO source code. Am I being naive?
Promoting some comments to an answer:
If you look at the answer given in Java Apache POI read Word (.doc) file and get named styles used, you'll see about how Apache Tika extracts paragraph style names. Taken from the Paragraph javadoc:
public short getStyleIndex()
Returns the index of the style which applies to this Paragraph. Details of the style can be looked up from the StyleSheet, via StyleSheet.getStyleDescription(int)
In your case, what you're after is the equivalent but for a Character Run. That is also (now) possible, as given in the CharacterRun.getStyleIndex() javadocs
public short getStyleIndex()
Returns the index of the base style which applies to this Run. Details of the style can be looked up from the StyleSheet, via StyleSheet.getStyleDescription(int).
Note that runs typically override some of the style properties from the base, so normally style information should be fetched directly from the CharacterRun itself.
To see this in action, a good example is given in the TestRangeProperties unit test. From there, we see code like this:
Range r = u.getRange();
StyleSheet ss = r._doc.getStyleSheet();
Paragraph p1 = r.getParagraph(0);
CharacterRun c1a = p1.getCharacterRun(0);
assertEquals("Normal", ss.getStyleDescription(c1a.getStyleIndex()).getName());
That shows you how to get the name of the base style applied to a Character Run
One final thing - for you'll need to use either a nightly build, or wait a bit for 3.11 beta 1, as some of the code mentioned isn't in 3.10 final.
use
paragraph.getCTP().getPPr().getRPr().isSetB()

POI Excel formula translated?

I am using German Excel 2007 and therefore the English formula are not evaluated automatically. In the German Excel I have no clue, where resp. how to get it working to evaluate English formulas.
So, I thought I am using the German ones, but this only throws a FormulaParsException. Setting the formula directly as cellvalue is obviously wrong, cause the content is not evaluated. I thought, perhaps I can turn off the evaluation resp. parsing, but no real success to it. I have seen, that I can write my own function, but to be honest, I wanna use an already built-in-function of Excel.
Can anybody give me a hint, how to use COUNTIF in German Excel? Resp. how to convince POI to accept ZAEHLENWENN?
Take a look at this question.
As stated in the answer, Apache POI doesn't support multiple languages, so you will have to use the English formulas to make it work.

Categories

Resources