Remove the first and last lines properties in the paper Itext7 - java

I need to remove property in Text (setRise) , if t.setRise(+-) gets out of fields paper.
PdfDocument pdfDoc = new PdfDocument(pdfWriter);
Document doc = new Document(pdfDoc, PageSize.A5);
doc.setMargins(0,0,0,36);
for (int i = 0; i <50 ; i++) {
Text t = new Text("hello " + i);
if(i ==0){
t.setTextRise(7);
}
if(i==31){
t.setTextRise(-35);
}
Paragraph p = new Paragraph(t);
p.setNextRenderer(new ParagraphRen(p,doc));
p.setFixedLeading(fixedLeading);
doc.add(p);
}
doc.close();
}
class ParagraphRen extends ParagraphRenderer{
private float heightDoc;
private float marginTop;
private float marginBot;
public ParagraphRen(Paragraph modelElement, Document doc) {
super(modelElement);
this.heightDoc =doc.getPdfDocument().getDefaultPageSize().getHeight();
this.marginTop = doc.getTopMargin();
this.marginBot = doc.getBottomMargin();
}
#Override
public void drawChildren(DrawContext drawContext) {
super.drawChildren(drawContext);
Rectangle rect = this.getOccupiedAreaBBox();
List<IRenderer> childRenderers = this.getChildRenderers();
//check first line
if(rect.getTop()<=heightDoc- marginTop) {
for (IRenderer iRenderer : childRenderers) {
if (iRenderer.getModelElement().hasProperty(72)) {
Object property = iRenderer.getModelElement().getProperty(72);
float v = (Float) property + rect.getTop();
//check text more AreaPage
if(v >heightDoc){
iRenderer.getModelElement().deleteOwnProperty(72);
}
}
}
}
//check last line
if(rect.getBottom()-marginBot-rect.getHeight()*2<0){
for (IRenderer iRenderer : childRenderers) {
if (iRenderer.getModelElement().hasProperty(72)) {
Object property = iRenderer.getModelElement().getProperty(72);
//if setRise(-..) more margin bottom setRise remove
if(rect.getBottom()-marginBot-rect.getHeight()+(Float) property<0)
iRenderer.getModelElement().deleteOwnProperty(72);
}
}
}
}
}
Here i check if first lines with setRise more the paper area I remove setRise property.
And if last lines with serRise(-35) more then margin bottom I remove it.
But it doesn't work. Properties don't remove.

Your problem is as follows: drawChildren method gets called after rendering has been done. At this stage iText usually doesn't consider properties of any elements: it just places the element in its occupied area, which has been calculated before, at layout() stage.
You can overcome it with layout emulation.
Let's add all your paragraphs to a div rather than directly to the document. Then emulate adding this div to the document:
LayoutResult result = div.createRendererSubTree().setParent(doc.getRenderer()).layout(new LayoutContext(new LayoutArea(0, PageSize.A5)));
In the snippet above I've tried to layout our div on a A5-sized document.
Now you can consider the result of layout and change some elements, which will be then processed for real with Document#add. For example, to get the 30th layouted paragraph one can use:
((DivRenderer)result.getSplitRenderer()).getChildRenderers().get(30);
Some more tips:
split renderer represent the part of the content which iText can place on the area, overflow - the content which overflows.

Related

Paragraph height is incorrectly calculated itext 7

if i calculate height of each element and compare with page height then only 4 elements intervene
PdfDocument pdf = new PdfDocument(new PdfWriter(DEST));
Document document = new Document(pdf);
pdf.setDefaultPageSize(PageSize.A5);
document.setMargins(0, 25, 25, 25);
float maxHeight = document.getPdfDocument().getDefaultPageSize().getWidth() ;/* (mainPdf_model.getLeftMargin() +mainPdf_model.getRightMargin());*/
float height = 0;
String line = "Hello! Welcome to iTextPdf";
Div div = new Div();
for (int i = 0; i < 30; i++) {
Paragraph element = new Paragraph();
element.add(line + " " + i);
element.setMargin(0);
element.setPadding(0);
element.setFixedLeading(100);
div.add(element);
IRenderer rendererSubTree = element.createRendererSubTree();
LayoutResult result = rendererSubTree.setParent(document.getRenderer()).
layout(new LayoutContext(new LayoutArea(1, new Rectangle(10000, 1000))));
height+=result.getOccupiedArea().getBBox().getHeight();
if(height<maxHeight) {
System.out.println(element);
}else {
}
}
document.add(div);
document.close();
System.out.println:
com.itextpdf.layout.element.Paragraph#319b92f3
com.itextpdf.layout.element.Paragraph#fcd6521
com.itextpdf.layout.element.Paragraph#27d415d9
com.itextpdf.layout.element.Paragraph#5c18298f
In pdf i have 6 elements:
First of all, you have a bug in your code:
float maxHeight = document.getPdfDocument().getDefaultPageSize().getWidth();
should be replaced with
float maxHeight = document.getPdfDocument().getDefaultPageSize().getHeight();
When you do so, your console output will have 5 lines instead of 4 already. This is still not the 6 lines you have on your page.
The difference comes from the fact that paragraph placement logic on a document does additional calculations and shifts of the first paragraph. To make sure there is not a lot of empty space at the start of the page, the first paragraph will use roughly half of the leading instead of the full leading. If you want to get precise calculations you'd better add elements to your Div and calculate the height of a Div with several Paragraph elements inside. To make the code more optimal you can use binary search on the number of elements that would fit on a single page.

Apache POI: ${my_placeholder} is treated as three different runs

I have a .docx template with placeholders to be filled, such as ${programming_language}, ${education}, etc.
The placeholder keywords must be easily distinguished from the other plain words, hence they are enclosed with ${ }.
for (XWPFTable table : doc.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (XWPFTableCell cell : row.getTableCells()) {
for (XWPFParagraph paragraph : cell.getParagraphs()) {
for (XWPFRun run : paragraph.getRuns()) {
System.out.println("run text: " + run.text());
/** replace text here, etc. */
}
}
}
}
}
I want to extract the placeholders together with the enclosing ${ } characters. The problem is, that is seems like the enclosing characters are treated as different runs...
run text: ${
run text: programming_language
run text: }
run text: Some plain text here
run text: ${
run text: education
run text: }
Instead, I would like to achieve the following effect:
run text: ${programming_language}
run text: Some plain text here
run text: ${education}
I have tried using other enclosing characters, such as: { }, < >, # #, etc.
I do not want to do some weird concatenations of runs, etc. I want to have it in a single XWPFRun.
If I cannot find the proper solution, I will just make it like so: VAR_PROGRAMMING_LANGUGE, VAR_EDUCATION, I think.
Current apache poi 4.1.2 provides TextSegment to deal with those Word text-run issues. XWPFParagraph.searchText searches for a string in a paragraph and returns a TextSegment. This provides access to the begin run and the end run of that text in that paragraph (BeginRun and EndRun). It also provides access to the start character position in begin run and end character position in end run (BeginChar and EndChar).
It additionally provides access to the index of the text element in the text run (BeginText and EndText). This always should be 0, because default text runs only have one text element.
Having this, we can do the following:
Replace the found partial string in begin run by the replacement. To do so, get the text part which was before the searched string and concatenate the replacement to it. After that the begin run fully contains the replacement.
Delete all text runs between begin run and end run as they contain parts of the searched string which is not more needed.
Let remain only the text part after the searched string in end run.
Doing so we are able replacing text which is in multiple text runs.
Following example shows this.
import java.io.*;
import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
public class WordReplaceTextSegment {
static public void replaceTextSegment(XWPFParagraph paragraph, String textToFind, String replacement) {
TextSegment foundTextSegment = null;
PositionInParagraph startPos = new PositionInParagraph(0, 0, 0);
while((foundTextSegment = paragraph.searchText(textToFind, startPos)) != null) { // search all text segments having text to find
System.out.println(foundTextSegment.getBeginRun()+":"+foundTextSegment.getBeginText()+":"+foundTextSegment.getBeginChar());
System.out.println(foundTextSegment.getEndRun()+":"+foundTextSegment.getEndText()+":"+foundTextSegment.getEndChar());
// maybe there is text before textToFind in begin run
XWPFRun beginRun = paragraph.getRuns().get(foundTextSegment.getBeginRun());
String textInBeginRun = beginRun.getText(foundTextSegment.getBeginText());
String textBefore = textInBeginRun.substring(0, foundTextSegment.getBeginChar()); // we only need the text before
// maybe there is text after textToFind in end run
XWPFRun endRun = paragraph.getRuns().get(foundTextSegment.getEndRun());
String textInEndRun = endRun.getText(foundTextSegment.getEndText());
String textAfter = textInEndRun.substring(foundTextSegment.getEndChar() + 1); // we only need the text after
if (foundTextSegment.getEndRun() == foundTextSegment.getBeginRun()) {
textInBeginRun = textBefore + replacement + textAfter; // if we have only one run, we need the text before, then the replacement, then the text after in that run
} else {
textInBeginRun = textBefore + replacement; // else we need the text before followed by the replacement in begin run
endRun.setText(textAfter, foundTextSegment.getEndText()); // and the text after in end run
}
beginRun.setText(textInBeginRun, foundTextSegment.getBeginText());
// runs between begin run and end run needs to be removed
for (int runBetween = foundTextSegment.getEndRun() - 1; runBetween > foundTextSegment.getBeginRun(); runBetween--) {
paragraph.removeRun(runBetween); // remove not needed runs
}
}
}
public static void main(String[] args) throws Exception {
XWPFDocument doc = new XWPFDocument(new FileInputStream("source.docx"));
String textToFind = "${This is the text to find}"; // might be in different runs
String replacement = "Replacement text";
for (XWPFParagraph paragraph : doc.getParagraphs()) { //go through all paragraphs
if (paragraph.getText().contains(textToFind)) { // paragraph contains text to find
replaceTextSegment(paragraph, textToFind, replacement);
}
}
FileOutputStream out = new FileOutputStream("result.docx");
doc.write(out);
out.close();
doc.close();
}
}
Above code works not in all cases because XWPFParagraph.searchText has bugs. So I will provide a better searchText method:
/**
* this methods parse the paragraph and search for the string searched.
* If it finds the string, it will return true and the position of the String
* will be saved in the parameter startPos.
*
* #param searched
* #param startPos
*/
static TextSegment searchText(XWPFParagraph paragraph, String searched, PositionInParagraph startPos) {
int startRun = startPos.getRun(),
startText = startPos.getText(),
startChar = startPos.getChar();
int beginRunPos = 0, candCharPos = 0;
boolean newList = false;
//CTR[] rArray = paragraph.getRArray(); //This does not contain all runs. It lacks hyperlink runs for ex.
java.util.List<XWPFRun> runs = paragraph.getRuns();
int beginTextPos = 0, beginCharPos = 0; //must be outside the for loop
//for (int runPos = startRun; runPos < rArray.length; runPos++) {
for (int runPos = startRun; runPos < runs.size(); runPos++) {
//int beginTextPos = 0, beginCharPos = 0, textPos = 0, charPos; //int beginTextPos = 0, beginCharPos = 0 must be outside the for loop
int textPos = 0, charPos;
//CTR ctRun = rArray[runPos];
CTR ctRun = runs.get(runPos).getCTR();
XmlCursor c = ctRun.newCursor();
c.selectPath("./*");
try {
while (c.toNextSelection()) {
XmlObject o = c.getObject();
if (o instanceof CTText) {
if (textPos >= startText) {
String candidate = ((CTText) o).getStringValue();
if (runPos == startRun) {
charPos = startChar;
} else {
charPos = 0;
}
for (; charPos < candidate.length(); charPos++) {
if ((candidate.charAt(charPos) == searched.charAt(0)) && (candCharPos == 0)) {
beginTextPos = textPos;
beginCharPos = charPos;
beginRunPos = runPos;
newList = true;
}
if (candidate.charAt(charPos) == searched.charAt(candCharPos)) {
if (candCharPos + 1 < searched.length()) {
candCharPos++;
} else if (newList) {
TextSegment segment = new TextSegment();
segment.setBeginRun(beginRunPos);
segment.setBeginText(beginTextPos);
segment.setBeginChar(beginCharPos);
segment.setEndRun(runPos);
segment.setEndText(textPos);
segment.setEndChar(charPos);
return segment;
}
} else {
candCharPos = 0;
}
}
}
textPos++;
} else if (o instanceof CTProofErr) {
c.removeXml();
} else if (o instanceof CTRPr) {
//do nothing
} else {
candCharPos = 0;
}
}
} finally {
c.dispose();
}
}
return null;
}
This will be called like:
...
while((foundTextSegment = searchText(paragraph, textToFind, startPos)) != null) {
...
Just like someone has commented your question, you can't have control where or when Word will split the paragraph in some runs. If the other answer still didn't help you, then I have the way I got around it:
First of all, this "solution" have a big problem, but still, I will put it here for the reason that someone can solve it.
public void mainMethod(XWPFParagraph paragraph) {
if (paragraph.getRuns().size() > 1) {
String myRun = unifyRuns(paragraph.getRuns());
// make the verification of placeholders ${...}
paragraph.getRuns().get(0).setText(myRun);
while(paragraph.getRuns().size() > 1) {
paragraph.removeRun(1);
}
}
}
private String unifyRuns(List<XWPFRun> runElements) {
StringBuilder unifiedRun = new StringBuilder();
for (XWPFRun run : runElements) {
unifiedRun.append(run);
}
return unifiedRun.toString();
}
The code may contain some error since I'm doing it as I remember.
The problem here is that when Word separates paragraphs into runs, it doesn't do it for nothing, because when there are texts with different fonts (like font-family or font-size), it separates the texts in different runs.
In the text "Here's my bold text", Word will split the text to separate the bold and normal text. Then, the code above is a bad solution if you are using POI to create large documents with different types of fonts. In that case you would need to verify first if the run is actualy in bold, then you will treat the placeholders.
Again, this a "solution" that i found, and it's not complete yet. Sorry for english errors, i'm using Google Translate to write this answer.

How to Highlight a text for a Pargraph in MS word using POI

I am developing a compare tool for the word document, whenever there is difference in both the document i need to highlight the substring in the paragraph.When i try to highlight using run, its highlighting whole paragraph instead of the sub string.
Can you please guide us, how can i achieve this for a substring.
I had the same problem. Here I post a sample method where you highlight a substring contained in a run.
private int highlightSubsentence(String sentence, XWPFParagraph p, int i) {
//get the current run Style - here I might need to save the current style
XWPFRun currentRun = p.getRuns().get(i);
String currentRunText = currentRun.text();
int sentenceLength = sentence.length();
int sentenceBeginIndex = currentRunText.indexOf(sentence);
int addedRuns = 0;
p.removeRun(i);
//Create, if necessary, a run before the highlight part
if (sentenceBeginIndex > 0) {
XWPFRun before = p.insertNewRun(i);
before.setText(currentRunText.substring(0, sentenceBeginIndex));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
// highlight the interesting part
XWPFRun sentenceRun = p.insertNewRun(i + addedRuns);
sentenceRun.setText(currentRunText.substring(sentenceBeginIndex, sentenceBeginIndex + sentenceLength));
currentStyle.copyStyle(sentenceRun);
CTShd cTShd = sentenceRun.getCTR().addNewRPr().addNewShd();
cTShd.setFill("00FFFF");
//Create, if necessary, a run after the highlight part
if (sentenceBeginIndex + sentenceLength != currentRunText.length()) {
XWPFRun after = p.insertNewRun(i + addedRuns + 1);
after.setText(currentRunText.substring(sentenceBeginIndex + sentenceLength));
//here I might need to re-introduce the style of the deleted run
addedRuns++;
}
return addedRuns;
}
You might need to save the formatting style of the run you delete in order to have the new runs with the old formatting.
Also, if the string you need to highlight is spread over more than one run, you will need to highlight all of them, but the core method is the one I posted.
On the Style Question:
I had a class Style that saved all the Styles of the old Run in private fields (for the respective classes you can look at what XWPFRun returns.
These are the sub-styles that I needed. There are others obviously I didn't cover
Style(XWPFRun run, XWPFDefaultRunStyle defaultRunStyle) {
fontSize = run.getFontSize();
fontFamily = run.getFontFamily();
bold = run.isBold();
italic = run.isItalic();
strike = run.isStrikeThrough();
underline = run.getUnderline();
color = run.getColor();
shadingColor = getShadeColor(run);
highlightColor = getHighlightedColor(run);
}
I copied the old style in the new run when needed.
public void copyStyle(XWPFRun newRun) {
if (fontSize != -1) {
newRun.setFontSize(fontSize);
}
newRun.setFontFamily(fontFamily);
newRun.setBold(bold);
newRun.setItalic(italic);
newRun.setStrikeThrough(strike);
newRun.setColor(color);
newRun.setUnderline(underline);
if (shadingColor != null) {
addShading(newRun, shadingColor);
}
if (highlightColor != null) {
addHighlight(newRun, highlightColor);
}
}
To add Shading and Highligh already present I used:
public static void addHighlight(XWPFRun run, STHighlightColor.Enum hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getHighlight() == null) {
run.getCTR().getRPr().addNewHighlight();
}
run.getCTR().getRPr().getHighlight().setVal(hexColor);
}
public static void addShading(XWPFRun run, Object hexColor) {
if (run.getCTR().getRPr() == null) {
run.getCTR().addNewRPr();
}
if (run.getCTR().getRPr().getShd() == null) {
run.getCTR().getRPr().addNewShd();
}
run.getCTR().getRPr().getShd().setFill(hexColor);
}

Apache POI: How do you restart numbering on a numbered list in word document?

I'm trying to use Apache POI XWPF library to produce a report in a Word docx file.
My approach is to use an existing Word Document as a Styles template. Within the template I defined a style named "SRINumberList".
So to load the template and remove everything that's not in the Header or Footer:
protected void createDocFromTemplate() {
try {
document = new XWPFDocument(this.getClass().getResourceAsStream(styleTemplate));
int pos = document.getBodyElements().size()-1;
while (pos >= 0) {
IBodyElement element = document.getBodyElements().get(pos);
if (!EnumSet.of(BodyType.HEADER, BodyType.FOOTER).contains(element.getPartType())) {
boolean success = document.removeBodyElement(pos);
logger.log(Level.INFO, "Removed body element "+pos+": "+success);
}
pos--;
}
} catch (IOException e) {
logger.log(Level.WARNING, "Not able to load style template", e);
document = new XWPFDocument();
}
}
Now within my document there are several different sections that contain a numbered lists. Each should be restart numbering from 1. This is the typical way I'm doing this:
if (itemStem.getItems().size() > 0) {
p = document.createParagraph();
p.setStyle(ParaStyle.StemAndItemTitle.styleId);
final BigInteger bulletNum = newBulletNumber();
run = p.createRun();
run.setText("Sub Items");
itemStem.getItems().stream().forEach(item -> {
XWPFParagraph p2 = document.createParagraph();
p2.setStyle(ParaStyle.NumberList.styleId);
XWPFRun run2 = p2.createRun();
run2.setText(item.getSubItemText());
});
p = document.createParagraph();
p.createRun();
}
So this correctly applies the Style that contains the number format, but there is only a single sequence (1 ... to however many list items exit in the doc). For example:
Heading 1
1. item a
2. item b
3. item c
Heading 2
4. item a
5. item d
6. item g
But what I want is:
Heading 1
1. item a
2. item b
3. item c
Heading 2
1. item a
2. item d
3. item g
So basically I'm trying to figure out how to use the style I have but restart page numbering a various spots in the document. Can someone provide a sample of how this would work?
The only way I found is to override the level in the CTNum. Another way could be to create lots of new abstract numberings/styles, but that would cost lots of style entries, when you open the document.
ArrayList<String> list = new ArrayList<String>();
list.add("SubItem 1");
list.add("SubItem 2");
list.add("SubItem 3");
XWPFNumbering numbering = document.getNumbering();
XWPFAbstractNum numAbstract = numbering.getAbstractNum(BigInteger.ONE);
for (Integer nx = 1; nx < 3; nx++) {
XWPFParagraph p = document.createParagraph();
XWPFRun run = p.createRun();
run.setText("Items " + nx.toString());
//leveloverride (start the new numbering)
BigInteger numId = numbering.addNum(numAbstract.getAbstractNum().getAbstractNumId());
XWPFNum num = numbering.getNum(numId);
CTNumLvl lvloverride = num.getCTNum().addNewLvlOverride();
lvloverride.setIlvl(BigInteger.ZERO);
CTDecimalNumber number = lvloverride.addNewStartOverride();
number.setVal(BigInteger.ONE);
for (String item : list) {
XWPFParagraph p2 = document.createParagraph();
p2.setNumID(num.getCTNum().getNumId());
CTNumPr numProp = p2.getCTP().getPPr().getNumPr();
numProp.addNewIlvl().setVal(BigInteger.ZERO);
XWPFRun run2 = p2.createRun();
run2.setText(item);
}
}
With some help from keil. I figured out the solution. I've posted a full working sample here: https://github.com/jimklo/apache-poi-sample
The trick is that you need to reference the the AbstractNum of the Numbering style defined in the document when creating a new Num that restarts the numbering.
Here are the highlights, however the key was having to determine what the AbstractNum ID is for the Style inside the document. It's seems unfortunate, that given this is just an XML doc, that there isn't some way to enumerate the existing Num's and AbstractNum's. If there is, I'd love to know the way to do that.
/**
* first discover all the numbering styles defined in the template.
* a bit brute force since I can't find a way to just enumerate all the
* abstractNum's inside the numbering.xml
*/
protected void initNumberingStyles() {
numbering = document.getNumbering();
BigInteger curIdx = BigInteger.ONE;
XWPFAbstractNum abstractNum;
while ((abstractNum = numbering.getAbstractNum(curIdx)) != null) {
if (abstractNum != null) {
CTString pStyle = abstractNum.getCTAbstractNum().getLvlArray(0).getPStyle();
if (pStyle != null) {
numberStyles.put(pStyle.getVal(), abstractNum);
}
}
curIdx = curIdx.add(BigInteger.ONE);
}
}
Now that we have a mapping from the Style to the AbstractNum, we can create a new Num that restarts via a LvlOverride and StartOverride.
/**
* This creates a new num based upon the specified numberStyle
* #param numberStyle
* #return
*/
private XWPFNum restartNumbering(String numberStyle) {
XWPFAbstractNum abstractNum = numberStyles.get(numberStyle);
BigInteger numId = numbering.addNum(abstractNum.getAbstractNum().getAbstractNumId());
XWPFNum num = numbering.getNum(numId);
CTNumLvl lvlOverride = num.getCTNum().addNewLvlOverride();
lvlOverride.setIlvl(BigInteger.ZERO);
CTDecimalNumber number = lvlOverride.addNewStartOverride();
number.setVal(BigInteger.ONE);
return num;
}
And now you can just apply that NumID to the list you're creating.
/**
* This creates a five item list with a simple heading, using the specified style..
* #param index
* #param styleName
*/
protected void createStyledNumberList(int index, String styleName) {
XWPFParagraph p = document.createParagraph();
XWPFRun run = p.createRun();
run.setText(String.format("List %d: - %s", index, styleName));
// restart numbering
XWPFNum num = restartNumbering(styleName);
for (int i=1; i<=5; i++) {
XWPFParagraph p2 = document.createParagraph();
// set the style for this paragraph
p2.setStyle(styleName);
// set numbering for paragraph
p2.setNumID(num.getCTNum().getNumId());
CTNumPr numProp = p2.getCTP().getPPr().getNumPr();
numProp.addNewIlvl().setVal(BigInteger.ZERO);
// set the text
XWPFRun run2 = p2.createRun();
run2.setText(String.format("Item #%d using '%s' style.", i, styleName));
}
// some whitespace
p = document.createParagraph();
p.createRun();
}
Again, overall I wouldn't have figured this out without the pointer that keil provided.

Synchronizing Caret Position with Text in a Corresponding String

I am in the process of adding some html based formatting features to a JTextPane. The idea is that the user would select text in the JTextPane, then click a button (bold, italic etc) to insert the html tags at the appropriate locations. I can do this without difficulty using the JTextPane.getSelectionStart() and .getSelectionEnd() methods.
My problem is that I also want to scan each character in the JTextPane to index all the html tag locations - this is so the software can detect where the JTextPane caret is in relation to the html tags. This information is then used if the user wants to remove the formatting tags.
I am having difficulty synchronising this character index with the caret position in the JTextPane. Here is the code I have been using:
public void scanHTML(){
try {
boolean blnDocStartFlag = false;
alTagRecords = new ArrayList(25);
alTextOnlyIndex = new ArrayList();
String strTagBuild = "";
int intTagIndex = 0; // The index for a tag pair record in alTagRecords.
int intTextOnlyCount = 0; // Counts each text character, ignoring all html tags.
// Loop through HTMLDoc character array:
for (int i = 0; i <= strHTMLDoc.length() -1; i ++){
// Look for the "<" angle bracket enclosing the tag keyword ...
if (strHTMLDoc.charAt(i) == '<'){// It is a html tag ...
int intTagStartLocation = i; // this value will go into alTagFields(?,0) later ...
while (strHTMLDoc.charAt(i) != '>'){
strTagBuild += strHTMLDoc.charAt(i);
i ++; // continue incrementing the iterator whilst in this sub loop ...
}
strTagBuild += '>'; // makes sure the closing tag is not missed from the string
if (!strTagBuild.startsWith("</")){
// Create new tag record:
ArrayList<Integer> alTagFields = new ArrayList(3);
alTagFields.add(0, intTagStartLocation); // Tag start location index ...
alTagFields.add(1, -1); // Tag end not known at this stage ...
alTagFields.add(2, getTagType(strTagBuild));
alTagRecords.add(intTagIndex, alTagFields); // Tag Type
System.out.println("Tag: " + strTagBuild);
intTagIndex ++; // Increment the tag records index ...
} else { // find corresponding start tag and store its location in the appropriate field of alTagFields:
int intManipulatedTagIndex = getMyOpeningTag(getTagType(strTagBuild));
ArrayList<Integer> alManipulateTagFields = alTagRecords.get(intManipulatedTagIndex);
alManipulateTagFields.set(1, (intTagStartLocation + strTagBuild.length() -1) ); // store the position of the end angled bracket of the closing tag ...
alTagRecords.set(intManipulatedTagIndex, alManipulateTagFields);
System.out.println("Tag: " + strTagBuild);
}
strTagBuild = "";
} else {
// Create the text index:
if (blnDocStartFlag == false){
int intAscii = (int) strHTMLDoc.charAt(i);
if (intAscii >= 33){ // Ascii character 33 is an exclamation mark(!). It is the first character after a space.
blnDocStartFlag = true;
}
}
// Has the first non space text character has been reached? ...
if (blnDocStartFlag == true){ // Index the character if it has ...
alTextOnlyIndex.add(i);
intTextOnlyCount ++;
}
}
}
} catch (Exception ex){
System.err.println("Error at HTMLTagIndexer.scanHTML: " + ex);
}
}
The problem with the code above is that the string variable strHTMLDoc is obtained using JTextPane.getText, and this appears to have inserted some extra space characters within the string. Consequently this has put it out of sync with the corresponding caret position in the text pane.
Can anybody suggest an alternative way to do what I am trying to achieve?
Many thanks

Categories

Resources