caret position into the html of JEditorPane

caret position into the html of JEditorPane - java

The getCaretPosition method of JEditorPane gives an index into the text only part of the html control. Is there a possibility to get the index into the html text?
To be more specific suppose I have a html text (where | denotes the caret position)
abcd<img src="1.jpg"/>123|<img src="2.jpg"/>
Now getCaretPosition gives 8 while I would need 25 as a result to read out the filename of the image.

I had mostly the same problem and solved it with the following method (I used JTextPane, but it should be the same for JEditorPane):
public int getCaretPositionHTML(JTextPane pane) {
HTMLDocument document = (HTMLDocument) pane.getDocument();
String text = pane.getText();
String x;
Random RNG = new Random();
while (true) {
x = RNG.nextLong() + "";
if (text.indexOf(x) < 0) break;
}
try {
document.insertString(pane.getCaretPosition(), x, null);
} catch (BadLocationException ex) {
ex.printStackTrace();
return -1;
}
text = pane.getText();
int i = text.indexOf(x);
pane.setText(text.replace(x, ""));
return i;
}
It just assumes your JTextPane won't contain all possible Long values ;)

The underlying model of the JEditorPane (some subclass of StyledDocument, in your case HTMLDocument) doesn't actually hold the HTML text as its internal representation. Instead, it has a tree of Elements containing style attributes. It only becomes HTML once that tree is run through the HTMLWriter. That makes what you're trying to do kinda tricky! I could imagine putting some flag attribute on the character element that you're currently on, and then using a specially crafted subclass of HTMLWriter to write out until that marker and count the characters, but that sounds like something of an epic hack. There is probably an easier way to get what you want there, though it's a bit unclear to me what that actually is.

I had the same problem, and solved it with the following code:
editor.getDocument().insertString(editor.getCaretPosition(),"String to insert", null);

I don't think you can transform your caret to be able to count tags as characters. If your final aim is to read image filename, you should use :
HTMLEditorKit (JEditorPane.getEditorKitForContentType("text/html") );
For more information about utilisation see Oracle HTMLEditorKit documentation and this O'Reilly PDF that contains interesting examples.

Related

Get the list of object containing text matching a pattern

I'm currently working with the API Apache POI and I'm trying to edit a Word document with it (*.docx). A document is composed by paragraphs (in XWPFParagraph objects) and a paragraph contains text embedded in 'runs' (XWPFRun). A paragraph can have many runs (depending on the text properties, but it's sometimes random). In my document I can have specific tags which I need to replace with data (all my tags follows this pattern <#TAG_NAME#>)
So for example, if I process a paragraph containing the text Some text with a tag <#SOMETAG#>, I could get something like this
XWPFParagraph paragraph = ... // Get a paragraph from the document
System.out.println(paragraph.getText());
// Prints: Some text with a tag <#SOMETAG#>
But if I want to edit the text of that paragraph I need to process the runs and the number of runs is not fixed. So if I show the content of runs with that code:
System.out.println("Number of runs: " + paragraph.getRuns().size());
for (XWPFRun run : paragraph.getRuns()) {
System.out.println(run.text());
}
Sometimes it can be like this:
// Output:
// Number of runs: 1
// Some text with a tag <#SOMETAG#>
And other time like this
// Output:
// Number of runs: 4
// Some text with a tag
// <#
// SOMETAG
// #>
What I need to do is to get the first run containing the start of the tag and the indexes of the following runs containing the rest of the tag (if the tag is divided in many runs). I've managed to get a first version of that algorithm but it only works if the beginning of the tag (<#) and the end of the tag (#>) aren't divided. Here's what I've already done.
So what I would like to get is an algorithm capable to manage that problem and if possible get it work with any given tag (not necessarily <# and #>, so I could replace with something like this {{{ and this }}}).
Sorry if my English isn't perfect, don't hesitate to ask me to clarify any point you want.

Finally I found the answer myself, I totally changed my way of thinking my original algorithm (I commented it so it might help someone who could be in the same situation I was)
// Before using the function, I'm sure that:
// paragraph.getText().contains(surroundedTag) == true
private void editParagraphWithData(XWPFParagraph paragraph, String surroundedTag, String replacement) {
List<Integer> runsToRemove = new LinkedList<Integer>();
StringBuilder tmpText = new StringBuilder();
int runCursor = 0;
// Processing (in normal order) the all runs until I found my surroundedTag
while (!tmpText.toString().contains(surroundedTag)) {
tmpText.append(paragraph.getRuns().get(runCursor).text());
runsToRemove.add(runCursor);
runCursor++;
}
tmpText = new StringBuilder();
// Processing back (in reverse order) to only keep the runs I need to edit/remove
while (!tmpText.toString().contains(surroundedTag)) {
runCursor--;
tmpText.insert(0, paragraph.getRuns().get(runCursor).text());
}
// Edit the first run of the tag
XWPFRun runToEdit = paragraph.getRuns().get(runCursor);
runToEdit.setText(tmpText.toString().replaceAll(surroundedTag, replacement), 0);
// Forget the runs I don't to remove
while (runCursor >= 0) {
runsToRemove.remove(0);
runCursor--;
}
// Remove the unused runs
Collections.reverse(runsToRemove);
for (Integer runToRemove : runsToRemove) {
paragraph.removeRun(runToRemove);
}
}
So now I'm processing all runs of the paragraph until I found my surrounded tag, then I'm processing back the paragraph to ignore the first runs if I don't need to edit them.

Remove Hightlight matching String content

Ok, few days ago I made one post regarding to the remove of Hightlighted text in JTextArea:
Removing Highlight from specific word - Java
The thing is, that time I made one code to remove Hightlights macthing its size...but now I have a lot of words with the same size in my app and obviously the application isnt running right.
So I ask, Does anyone know a library or a way to do this removal macthing the content of each highlighted string?

You could write a method to get the text for a given highlighter:
private static String highlightedText(Highlight h, Document d) {
int start = h.getStartIndex();
int end = h.getEndIndex();
int length = end - start;
return d.getText(start, length);
}
Then your removeHighlights method would look like this:
public void removeHighlights(JTextComponent c, String toBlackOut) {
Highlighter highlighter = c.getHighlighter();
Highlighter.Highlight[] highlights = h.getHighlights();
Document d = c.getDocument();
for (Highlighter.Highlight h : highlights)
if (highlightedText(h, d).equals(toBlackOut) && h.getPainter() instanceof TextHighLighter)
highlighter.removeHighlight(h);
}

How many times a text appears in webpage - Selenium Webdriver

Hi I would like to count how many times a text Ex: "VIM LIQUID MARATHI" appears on a page using selenium webdriver(java). Please help.
I have used the following to check if a text appears in the page using the following in the main class
assertEquals(true,isTextPresent("VIM LIQUID MARATHI"));
and a function to return a boolean
protected boolean isTextPresent(String text){
try{
boolean b = driver.getPageSource().contains(text);
System.out.println(b);
return b;
}
catch(Exception e){
return false;
}
}
... but do not know how to count the number of occurrences...

The problem with using getPageSource(), is there could be id's, classnames, or other parts of the code which match your String, but those don't actually appear on the page. I suggest just using getText() on the body element, which will only return the page's content, and not HTML. If I'm understanding your question correctly, I think that is more what you are looking for.
// get the text of the body element
WebElement body = driver.findElement(By.tagName("body"));
String bodyText = body.getText();
// count occurrences of the string
int count = 0;
// search for the String within the text
while (bodyText.contains("VIM LIQUID MARATHI")){
// when match is found, increment the count
count++;
// continue searching from where you left off
bodyText = bodyText.substring(bodyText.indexOf("VIM LIQUID MARATHI") + "VIM LIQUID MARATHI".length());
}
System.out.println(count);
The variable count contains the number of occurrences.

There are two different ways to do this:
int size = driver.findElements(By.xpath("//*[text()='text to match']")).size();
This will tell the driver to find all of the elements that have the text, and then output the size.
The second way is to search the HTML, like you said.
int size = driver.getPageSource().split("text to match").length-1;
This will get the page source, the split the string whenever it finds the match, then counts the number of splits it made.

You can try to execute javascript expression using webdriver:
((JavascriptExecutor)driver).executeScript("yourScript();");
If you are using jQuery on your page you can use jQuery's selectors:
((JavascriptExecutor)driver).executeScript("return jQuery([proper selector]).size()");
[proper selector] - this should be selector that will match text you are searching for.

Try
int size = driver.findElements(By.partialLinkText("VIM MARATHI")).size();

Preserving lines with Jsoup

I am using Jsoup to get some data from html, I have this code:
System.out.println("nie jest");
StringBuffer url=new StringBuffer("http://www.darklyrics.com/lyrics/");
url.append(args[0]);
url.append("/");
url.append(args[1]);
url.append(".html");
//wyciaganie odpowiednich klas z naszego htmla
Document doc=Jsoup.connect(url.toString()).get();
Element lyrics=doc.getElementsByClass("lyrics").first();
Element tracks=doc.getElementsByClass("albumlyrics").first();
//Jso
//lista sciezek
int numberOfTracks=tracks.getElementsByTag("a").size();
Everything would be fine, I extracthe data I want, but when I do:
lyrics.text()
I get the text with no line breaks, so I am wondering how to leave line breaks in displayed text, I read other threads on stackoverflow on this matter but they weren't helpful, I tried to do something like this:
TextNode tex=TextNode.createFromEncoded(lyrics.text(), lyrics.baseUri());
but I can't get the text I want with line breaks. I looked at previous threads about this like,
Removing HTML entities while preserving line breaks with JSoup
but I can't get the effect I want. What should I do?
Edit: I got the effect I wanted but I don't think it is very good solution:
for (Node nn:listOfNodes)
{
String s=Jsoup.parse(nn.toString()).text();
if ((nn.nodeName()=="#text" || nn.nodeName()=="h3"))
{
buf.append(s+"\n");
}
}
Anyone got better idea?

You could get the text nodes (the text between <br />s) by checking if the node is an instance of TextNode. This should work out for you:
Document document = Jsoup.connect(url.toString()).get();
Element lyrics = document.select(".lyrics").first();
StringWriter buffer = new StringWriter();
PrintWriter writer = new PrintWriter(buffer);
for (Node node : lyrics.childNodes()) {
if (node.nodeName().equals("h3")) {
writer.println(((Element) node).text());
} else if (node instanceof TextNode) {
writer.println(((TextNode) node).text());
}
}
System.out.println(buffer.toString());
(please note that comparing the object's internal value should be done by equals() method, not ==; strings are objects, not primitives)
Oh, I also suggest to read their privacy policy.

DocumentListener slows down Document.setCharacterAttributes method?

this is my first question in this site, though is not the first time I enter to clear my doubts, awesome webpage. :)
I'm writing a java program that highlights code in a JTextPane and I'm changing the way highlights are done. I'm using a JTabbedPane to let the user edit more than one file at the same time and I used to perform document highlights using a Timer, now I've built a highlight queue that runs in a separate thread and implemented a DocumentListener that queues the documents as changes take place.
But I have a really big problem, if I add the document via DocumentListener, the Highlight process takes a really long time while if I add it in the main class by getting the document directly from the JTextPane, it takes just a few milliseconds.
I've performed multiple benchmarks in my code and found out that what takes so much time to be performed when the document is added from the DocumentListener is the method Document.setCharacterAttributes().
Here is the method that adds documents via DocumentListener:
// eventType: 0 - insertUpdate / 1- removeUpdate
private void queueChange(javax.swing.event.DocumentEvent e, int eventType){
StyledDocument doc = (StyledDocument) e.getDocument();
int changeLength = e.getLength();
int changeOffset = e.getOffset();
int length = doc.getLength();
String title = (String) doc.getProperty("title");
String text;
try {
text = doc.getText(0, length);
if (changeLength != 1) {
Element element = doc.getDefaultRootElement();
int startLn = element.getElement(element.getElementIndex(changeOffset)).getStartOffset();
int endLn = element.getElement(element.getElementIndex(changeOffset + changeLength)).getEndOffset() - 1;
Engine.addDocument(doc, startLn, endLn, title, text);
} else {
if(eventType == 1){
changeOffset = changeOffset - changeLength;
}
int startLn = text.lastIndexOf("\n", changeOffset) + 1;
int endLn = text.indexOf("\n", changeOffset);
if (endLn < 0) {
if (length != startLn) {
endLn = length;
Engine.addDocument(doc, startLn, endLn, title, text);
}
} else if (startLn != endLn && startLn < endLn) {
Engine.addDocument(doc, startLn, endLn, title, text);
}
}
} catch (BadLocationException ex) {
Engine.crashEngine();
}
}
If I add a document with 2k lines with this method, it takes ~1900 ms to highlight the whole document, while if I add the document to the highlight queue by using a caret listening method it takes ~500 ms.
Here's a part of the caret listening method that is used to highlight whole documents when they're loaded:
if (loadFile == true) {
isKey = false;
doc = edit[currentTab].Editor.getStyledDocument();
try {
Highlight.addDocument(doc, 0, doc.getLength(),
Scripts.getTitleAt(currentTab), doc.getText(0, doc.getLength()));
} catch (BadLocationException ex) {
ex.printStackTrace();
}
loadFile = false;
}
Note: the Highlight/Engine.addDocument() method has five parameters: (StyledDocument doc,int start, int end, String tabTitle, String docText). Start and end both indicate the region where highlighting is needed.
I will appreciate any help related to this problem cause I've been trying to solve it for a few days and I can't find anything similar on the Internet. :(
Btw, does anyone know the actual difference between Document.setCharacterAttributes and Document.setParagraphAttributes? :P

Maybe you have some kind of recursion in your code that is causing the problem. With the DocumentEvent you should only worry about additions and removals. You don't need to worry about changes since those are attribute changes.
Maybe you add some text which schedules the highlighting, but then when you change the attributes of the text you schedule another highllighting task.

You can try to set a flag indicating whether it's user changes or your API changes. In the beginning of the Engine.addDocument() set the flag to API state and reset it back after changes are done.
In your listener check the flag and skip changes from API.
You wrote " I use highlights the text by setting the character attributes of a portion of the Document, so the method is not inserting more text". I'm not sure it doesn't insert text. E.g. you have "it's a bold text piece" then you select the "bold" and change attributes to bold. Original element is separated and 3 new elements appear. I didn't test it but it might call insertUpdate() and removeUpdate()
does anyone know the actual difference between Document.setCharacterAttributes and Document.setParagraphAttributes?
There are paragraph and char attributes. Char attributes are font size, family, style, colors. Paragraph attributes are alignment, indentation, line spacing.
Actually paragraphs are char elements' parents.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

caret position into the html of JEditorPane - java

I had the same problem, and solved it with the following code: editor.getDocument().insertString(editor.getCaretPosition(),"String to insert", null);

Related

Get the list of object containing text matching a pattern

Remove Hightlight matching String content

How many times a text appears in webpage - Selenium Webdriver

Preserving lines with Jsoup

DocumentListener slows down Document.setCharacterAttributes method?

Categories

Resources