Getting paragraph styles in apache POI, language specific

Getting paragraph styles in apache POI, language specific - java

Is it possible to get the styles of a paragraph in a particular langage ?. For example: on my personal computer I happen to have a dutch installation of microsoft windows. this is resulting in the paragraph.getStyles() method returning the dutch values of the styles, instead of a normal value of "heading1", "heading2" etc I am receiving values such as"Kop1", "kop2".
I am creating a parser for word based documents which selects certain parts on style. does anyone have any experience with this ?

I would take a look at the data in the .docx file (it's a zip-file) to verify if the data is written this way by Word already or "transposed" by POI or some local functionality.
If the data is already written by Word you will need to check how you can create the document in a different language in Word.
If not, then if you are using POI 3.13 or newer, you can try to set a different locale via LocaleUtil.setUserLocale() and see if that affects the results.

Related

How can i change text orientation in XWPFDocument?

I am working right now in producing word documents in java and i am using XWPFDocument of POI Apache. The final document must looks like this
http://sk.uploads.im/t/rtwvm.png
till no everything works fine, I created table, managed to merge cells but i can not find a way to change the text orientation in table cells. I simply want "Type 1" to be upward.
I only found a solution using cellStyle which seems to work only in excel and not in word, which i am using.

You probably need to create two documents in Word, one with the normal orientation and one with the changed one, then unzip them (.docx is actually a Zip-File) and analyze which xml-structure is responsible for this.
Then you can check if POI already offers higher level APIs for these or if you need to access the low-level POI classes via the getCTxxx() methods, e.g. XWPFTableCell.getCTTc() returns the underlying XML structure and allows you to do things that are not possible via the normal POI interfaces.

You can use something like: cell.getCTTc().getTcPr().addNewTextDirection().setVal(STTextDirection.BT_LR),
where the parameters are found in: STTextDirection.
The problem I couldn't solve yet is that the row height does not update automatically to the vertical text length, then the text is not completely showed. If you solve it, please, post here.

Java Apache POI read Word (.doc) file and get named CHARACTER styles used

this follows on from here:
Java Apache POI read Word (.doc) file and get named styles used
at the time (10/2012) there was a solution to finding paragraph styles but not character styles.
And yet... if you use LibreOffice Writer to open a Word doc, for example, it does translate styles and highlighting from .doc to .odt ... so someone somewhere appears to have cracked this...
I don't know whether the Apache POI team and the LibreOffice/OpenOffice teams are in any way related, but I'd have thought the Apache POI team would've been able to get this functionality from the LO source code. Am I being naive?

Promoting some comments to an answer:
If you look at the answer given in Java Apache POI read Word (.doc) file and get named styles used, you'll see about how Apache Tika extracts paragraph style names. Taken from the Paragraph javadoc:
public short getStyleIndex()
Returns the index of the style which applies to this Paragraph. Details of the style can be looked up from the StyleSheet, via StyleSheet.getStyleDescription(int)
In your case, what you're after is the equivalent but for a Character Run. That is also (now) possible, as given in the CharacterRun.getStyleIndex() javadocs
public short getStyleIndex()
Returns the index of the base style which applies to this Run. Details of the style can be looked up from the StyleSheet, via StyleSheet.getStyleDescription(int).
Note that runs typically override some of the style properties from the base, so normally style information should be fetched directly from the CharacterRun itself.
To see this in action, a good example is given in the TestRangeProperties unit test. From there, we see code like this:
Range r = u.getRange();
StyleSheet ss = r._doc.getStyleSheet();
Paragraph p1 = r.getParagraph(0);
CharacterRun c1a = p1.getCharacterRun(0);
assertEquals("Normal", ss.getStyleDescription(c1a.getStyleIndex()).getName());
That shows you how to get the name of the base style applied to a Character Run
One final thing - for you'll need to use either a nightly build, or wait a bit for 3.11 beta 1, as some of the code mentioned isn't in 3.10 final.

use
paragraph.getCTP().getPPr().getRPr().isSetB()

Change decimal and thousands separators in excel using Apache POI

Does anyone know if using apache-poi library you can change the decimal and thousands separators for Microsoft Excel?
I need to export in excel some data from an web application, and the numbers are formatted depending on some the user's settings. so when the data is exported the numbers should look exactly how they are in the application's page.
Thanks

You need to set your CellStyle dataFormat in this way (if you use integer and want thousand separator)
cellStyle.setDataFormat(creationHelper.createDataFormat().getFormat("#,##0"));
cell.setCellStyle(cellStyle);
I think that you need something like that: (I didn't try it, so maybe you need to modify it a little bit) #,##0.00
please note: is very important you use comma, and not dot. If your locale is setted correcty, you will see a dot.

Formatting in Excel is controlled through the Tools > Options > International dialogs, and is stored in local preferences, not in a file. So you can't control this through POI.
The only solution I can think of is to provide text rather than numbers. But it will prevent user from doing any calculation in Excel.

There's only formatting. It means this format is my format for formatting numeric. The comma is a symbol equals only part of thousands while the dot is part of decimal. You could use "#,##0.00" or "#,##0" does not matter because Microsoft Excel has local settings of separator applies to the application, not a file, you cannot override via API.
Remember, the sheet has a predefined cell style. A cell has a reference only to style. If you change on cell, you change all cells this type.
I have the same issue with format of cell. I think I try to use the method "setVBAProject" on XSSFWorkbook.
https://social.technet.microsoft.com/Forums/office/en-US/eaa4c7f6-197a-4b33-bc5f-20896e5a7e3a/workbook-or-worksheet-specific-decimal-separator?forum=excel

Approach for Automating localized Web application in Selenium using Java Bindings

I am automating test cases for a web application using selenium 2.0 and Java. My application supports multiple languages. Some of the test cases require me to validate the text that appears in the UI like success/error messages etc.I am using a properties file to store whatever text I am referring in my tests from the UI, currently only english. For example there is locale_english.properties(see below) that contains all references in english. I am going to have multiple properties files like this for different locales like locale_chinese.properties,locale_french.properties and so on. For locales other than english, their corresponding properties file would have UTF-8 characters (e.g \u30ed) representing the native characters(see below). So If I want to test say Chinese UI, I would load "locale_chinese.properties" instead of "locale_english.properties". I am going to convert the native characters for non-english locale using perhaps native2ascii from JDK or some other way.I tested that Selenium API works well with UTF-8 characters for non-english locales
---locale_english.properties------
user.login.error= Please verify username/password
---locale_chinese.properties------
user.login.error= \u30ed\u30ef\u30eg\u30eh\u30ed
and so on.
The problem is that my locale_english.properties is growing and going out of control. It is becoming hard to manage a single properties file for one locale let alone for multiple locales. Is there a better way of handling localization in Java, particularly in situations like I am in?
Thanks!

You're right that there is a problem managing the files, but you're also right that this is the best approach. Some things are just hard :-(
Selenium (at least the Selenium RC API) does indeed support Unicode input and output, we have lots of tests that enter and confirm Cyrillic and Simple Chinese characters from C#. Since Java strings are Unicode at the core (just like C#), I expect you could simply create the file in a UTF-8-friendly editor like Notepad++ and read them straight into strings and use them directly in the Selenium API.

This is how I solved the issue for those who are interested.

a database would work better for many reasons, like growth, central location, kept outside of app and can be edited and maintained outside of app. We used a table with columns:
id (int) auto increment
id_text -- this and other columns are varchar ... except for date time for last 2
lang
translation
created_by
updated_by
created_date
updated_date
An id is a short english description of the text - like 'hello' or 'error1msg', the key in your map.
In java had a function to get the text for a particular text ... and a app level property - default language (usually en but good to keep it configurable)
Function would scan already loaded hashmap for language asked for - say "ch"
If corresponding translation was not found for this language we would return the default language translation and if that was not founf then we would return "[" + id "]" so the tester knows something is missing in data base - can go to web screen to edit translation table and add it.

Format Java Code into Word / RTF

I need to format java code to put into a Word document. Are there any programs that will do this with keyword highlighting, etc. ?

When I copy/paste from my IDE (Eclipse), the formatting comes along for the ride.
You'll probably want to turn off "Mark Occurrences" first.

This is a late reply but since it's quite a specific requirement I'll post my comment anyway.
You can do this programmatically with Docmosis assuming you want the program to be running in Java (not just showing java in documents) and can install OpenOffice where the program runs. The process would be:
Create a doc or odt file that will
act as a template (setting fonts,
position, tables etc) and will have
a placeholder for where you want to
insert the code sample
Add docmosis to your java project
and write the code to initialise
Docmosis, register the template,
then render document with your
selected Java code.
Currently, Docmosis FieldRenderers
can underline or italicize your data
as it goes, but the rendering is
currently applied to the entire
field. So this wouldn't let you
have a single field for all your
java text and individually highlight
words, but there are a few other
tricks that you could employ to get
useful/interesting results (such as
splitting your data into separate
fields and letting Docmosis render
the fields differently).
The "java code" text that you specify as data will be inserted into your template using the font and layout properties in the template. The renderer will have a chance to override specific formatting.

You can just copy and then paste it to the word document. I am using OS X as well. I just works fine. I am uploading the screenshot of how it looks in word.

I'm using Easy Code Formatter as called out here: How do you display code snippets in MS Word preserving format and syntax highlighting?
It's an Office add-in. You can select multiple themes, enable / disable line numbering / highlight lines in rectangles. It allows you to select the coding style / and has a quick formatting button. Pretty neat.
Requires you to have Office 2013 or beyond.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.