HtmlUnit - getByXPath with unknown element type - java

I'm using HtmlUnit to scrape data and I'm getting used to the syntax of XPath.
However I've run into a problem.
I have an element that I need to pull that varies between pages, sometimes it is a "span" element and sometimes it is an "a" element (a link). The reason being simply sometimes the item I am scraping has a link and sometimes it is just plain text (to state the obvious).
What is the same however is an attribute called "data-reactid", which always has a set value of, let's just say 99.
I've been reading and messing around, and have been trying things like this:
HtmlElement element = (HtmlElement) myPage.getFirstByXPath("//#data-reactid='99'");
System.out.println(element.getTextContent());
I am getting the following error:
java.lang.ClassCastException: java.lang.Boolean cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlElement
Why getFirstByXPath() is returning a boolean is beyond me.
So my question is, how can I access an element by a specified attribute and value, when I do not know what type the element will be?
Thanks!

It's giving you a boolean because your XPath is asking for a boolean. Your XPath,
//#data-reactid='99'
is asking the question "does there exist a data-reactid attribute anywhere in my document with a value of 99?"
What you want is a predicate -- that is, "select elements where this logical condition is true". For all elements (we'll use a * wildcard since we don't know the name) that have a #data-reactid of 99:
//*[#data-reactid = '99']

Related

Remove certain element from array in Velocity Template Language (VTL)

I would like to remove a certain element from an array in Velocity Template Language. I did not find any appropriate method looking through the documentation of Apache VTL, that's why I am asking here for help. I have tried following (.remove() doesn't seem to be a method on array items):
#set($linkedWIARRAY = ["ABC-123, DEF-345, GHI-678"])
#set($dummy=$linkedWIARRAY.add("JKL-901"))
#set($dummy = $linkedWIARRAY.remove("DEF-345"))
$linkedWIARRAY
$linkedWIARRAY returns [ABC-123, DEF-345, GHI-678, JKL-901], showing that remove very likely doesn't exist as method on arrays ;)
There is a similar question on SO, that didn't help me:
velocity template drop element from array
The problem lies in the initialization of the list. It should be:
#set($linkedWIARRAY = ["ABC-123", "DEF-345", "GHI-678"])
that is, each string should be enclosed in double quotes, not the whole string.

How can i regex the xpath(contain text) in Java (selenium).for a CaseID

Example A:
I need to make below object as dynamic as possible, in order to have robust/flexibility. this is an upload button, but the value of element tend to change for time being:
xpath="//input[#id='**j_idt162:input**'] ,
so i tried below :
xpath="//input[#id='j_idt[0-9],{1,4}:input']
Example B:
i have lists of caseIDs, i only need to get one of it. doesnt matter from the top or down. instead of using static below
xpath = "//a[contains(.,'3131')]")
i tried this
xpath = "//a[contains(.,'^[0-9]{1,5}$')]"), "index:=0"
none of above is working, Example A, i tried to only give 4 digit number but range is dynamic.
Example B, I'm trying to let it pick up only first one link with with limited to range 5, for instance (12345)'
Thanks in advance for answering
HTML node example
It's not possible to use Regex in XPath 1.0, and most browsers support XPath 1.0 only.
You'd better use prefix and suffix in class or id. Then use start-with() and end-with() or contains() to find the element.
For example,
//*[starts-with(#id, 'sometext') and ends-with(#id, '_text')]
Check this answer.

How to get count of Selenium XPath results

As of now I am getting the count of the number of matching results using listChanges.size() . How do I directly get the count without loading getChanges in the list?
By getChanges = By.xpath("//td[contains(#class,'blob-code blob-code-addition') or contains(#class,'blob-code blob-code-deletion')]");
List<WebElement> listChanges = driver.findElements(getChanges);
I found this(Count function in XPath) and I tried the below which does not work!
Integer getCount = By.xpath(count("//td[contains(#class,'blob-code blob-code-addition') or contains(#class,'blob-code blob-code-deletion')]"));
Looks like I have to do something like this.
Integer getCount = By.xpath("count(//td[contains(#class,'blob-code blob-code-addition') or contains(#class,'blob-code blob-code-deletion')])");
But the right hand side returns an object of type By
As alex says, size() is the way to go. However I do have another suggestion for you.
Even though the proper way to find the element counts is to use WebDriver api with findElements() as per my knowledge. Another way is to execute javascript by using executeScript() and with proper script. I am not sure if javascript and xpath can be mixed together to accomplish this since xpath execution through javascript is not multi-browser right now. See this. However, I do think using cssSelector with javascript can make it lot easier to accomplish. See the following code :
String cssQuery = ".blob-code-addition, .blob-code-deletion";
String script = "return document.querySelectorAll('" + cssQuery + "').length;";
Object count = ((JavascriptExecutor)driver).executeScript(script);
System.out.println(count);
Print
26
You cannot get the count using XPath, because an xpath expression in selenium has to correspond to an actual element on a page.
The way you are doing it via findElements() + size() is how you have to count elements using the selenium webdriver.

Selenium won't read the current input value

I'm running Selenium on a site that changes the value of a disabled input text box using jquery. Looking at the HTML, the value of the input box continues to say "Not Available" even though the value is obviously changed.
I can get the current value using Firebug with
$("#inputid").val()
but I get the value "Not Available when I've used my selenium code:
driver.findElement(By.id("inputid")).getAttribute("value");
Any suggestions on how to get this value in Selenium? I want to avoid trying to use something like JavascriptExecutor but if that's the best solution it would be good to know.
I don't have access to the jQuery code so I can't help you there. Sorry :-/
If the value is changed by jQuery due to some DOM events, chances are your Selenium test is going to check for the new value too fast. You can get the value after it changes away from "Not Available" with something like this:
WebDriverWait wait = new WebDriverWait(driver,10);
String value = wait.until(new ExpectedCondition<String>() {
public String apply(WebDriver driver) {
String value = driver.findElement(By.id("inputid")).getAttribute("value");
if value.equals("Not Available")
return null;
return value;
}
});
(Disclaimer: It's been ages since I've written Java code so I may have goofed in the code above.) The wait.until call will run the apply method until it returns something else than null. It will wait for at most 10 seconds. The value returned by wait.until will be the value that was last returned by the apply that terminated the end. In other words, it will return the new value of the element.
You say
Looking at the HTML, the value of the input box continues to say "Not Available" even though the value is obviously changed.
Yes, that's a quirk of the DOM. When you change the value of the input field, the value attribute on the element that represent the input field does not change. What changes is the value property on the element. This is why you have to do $("#inputid").val(), not $("#inputid").attr('value').

Lucene TermFrequenciesVector

what do I obtain if I call IndexReader.getTermFrequenciesVector(...) on an index created with TermVector.YES option?
The documentation already answers this, as Xodorap notes in a comment.
The TermFreqVector object returned can retrieve which terms (words produced by your analyzer) a field contains and how many times each of those terms exists within that field.
You can cast the returned TermFreqVector to the interface TermPositionVector if you index the field using TermVector.WITH_OFFSETS, TermVector.WITH_POSITIONS or TermVector.WITH_POSITIONS_OFFSETS. This gives you access to GetTermPositions with allow you to check where in the field the term exists, and GetOffsets which allows you to check where in the original content the term originated from. The later allows, combined with Store.YES, highlighting of matching terms in a search query.
There are different contributed highlighters available under Contrib area found at the Lucene homepage.
Or you can implement proximity or first occurrence type score contributions. Which highlighting won't help you with at all.

Categories

Resources