I have the HTML (partial) shown below. I want to find the element using:
org.jsoup.nodes.Element elem = doc.getElementById("date-2011-04-23");
But I always get a NULL. Can anyone help me? As a check, I've also code this using VB.NET and I can access this element.
<td class="" id="date-2011-04-23" data-week="3" data-wkday="6">...</td>
Assuming that your tag looks like:
<td class="" id="date-2011-04-23" data-week="3" data-wkday="6">...</td>
You can use the JSoup Selector API for this:
for( Element element : doc.select("#date-2011-04-23") )
{
// Do something here
}
If you need only the first Element:
Element element = doc.select("#date-2011-04-23").first();
The reason you're not finding that content in the HTML is that the schedule is loaded from a JSON file by the browser executing Javascript, then adding it to the browser DOM. Jsoup does not execute Javascript, so it only can see what is in the source HTML.
If you use a debugging proxy like Charles (or the debugging network pane in Chrome / Firefox), you can see all the requests a browser makes to render a page. In this example, the schedule data is coming from http://mlb.mlb.com/gen/schedule/phi/2011_4.json
Related
I'm writing a scraper. When I use inspect element in chrome I see the following:
but when I run my code Elements data = doc.select("div.item-header"); and I print the object data I see that the object has the following chunk of html in it:
<div class="item-header">
<h1 class="text size-20">Snake print bell sleeves top</h1>
<div class="text size-12 muted brandname ma_top5">
<!-- data here is irrelevant -->
</div>
</div>
So, what I can't figure out is, why does my code get a different html than that visible in chrome's inspect element? What am I missing here?
I'm using java, the library is Jsoup. Any help is greatly appreciated.
Websites consist of HTML and JavaScript code. Often that JavaScript is executed when the page is loaded and it's possible that the source of a page is modified or some additional content is loaded by asynchronous AJAX calls. Jsoup can't parse Javascript so it can only parse the original HTML document.
Don't use Chrome's Inspect option as it presents HTML after possible transformations. Use View source (CTRL+U). This way you'll see original HTML source unmodified by JavaScript (you can also try reloading the page with JavaScript disabled). And that original source is what gets downloaded and parsed by Jsoup.
If that's the case and you really want to parse the data that's loaded by JavaScript try to observe XHR requests in Chrome's Network tab. You can check this answer to see what I mean: How to Load Entire Contents of HTML - Jsoup
I am working on an web application which has a query form to give search criteria. Once the query criteria is filled out in the form and searched, a table loads below the search form.
Now this table is not formed by usual tr and td tags but is made up of several script tags like-
<TABLE>
<THEAD>...</THEAD>
<TBODY>
<SCRIPT>
var colElm1 = document.createElement("SPAN");
colElm1.innerText = "ABCD";
rowElm1.appendChild(colElm1);
</SCRIPT>
<SCRIPT>
var colElm1 = document.createElement("SPAN");
colElm1.innerText = "AB_CD123";
rowElm1.appendChild(colElm1);
</SCRIPT>
....
</TBODY>
</TABLE>
Now my problem is that is there any way to get the "ABCD" and "AB_CD123" using selenium and not using regex on the whole source code ?
As the code in Script tag shows it will append SPAN into element 'rowElm1'.
rowElm1.appendChild(colElm1);
This 'rowElm1' should be some element from page, once you find where assign value to 'rowElm1' from the whole page source, you will know how to find it from page and then you will see all text of appened SPAN will display inside the 'rowElm1'.
So you no need to obtain the text of SPAN from SCRIPT tag, you should get them from the element 'rowElm1' on page.
driver.findElement(locator of 'rowElm1').getText();
UPDATE
Misunderstood the question, I thought You wanted to change the content of <script> tags.
the You can probably do this for all <script> tags one by one using JavascriptExecutor as below :
WebElement scriptTag1 = driver.findElement(By.xpath("//table//tbody/script[1]"));
JavascriptExecutor js = (JavascriptExecutor)driver;
//you can use following line
js.executeScript("arguments[0].setAttribute('value', 'your new value');", scriptTag1);
//or if above line doesn't work then following line
js.executeScript("arguments[0].setAttribute('innerHTML', 'your new value');", scriptTag1);
'your new value' should provide whole content of <script...</script> tag
I have the following html tag and I want to receive "name":"test_1476979972086" from my Java Selenium code.
How can I achive this?
I already tried getText and getAttribute function but without any success.
<a data-ng-href="#/devices"
target="_blank"
class="ng-binding"
href="#/devices">
{"name":"test_1476979972086"}
</a>
getText() is always emtpy. The xpath is unique. newDevice.created is unique on page.
final By successMessageBy = By.xpath("//p[#data-ng-show='newDevice.created']/a");
final WebElement successMessage = wait.until(ExpectedConditions.presenceOfElementLocated(successMessageBy));
final String msg = successMessage.getText();
Actually WebElement#getText() returns only visible text. It could be possible element is present there but text would be visible later.
So if WebElement#getText() doesn't work as expected, you should try using getAttribute("textContent") as below:-
successMessage.getAttribute("textContent");
upon first glance, the below should work. the fact that what you've tried doesnt work, leads me to believe that you aren't selecting the correct element. since i am ignorant of the rest of your html, this might not be unique. you'll have to play around with it, or share the surrounding html
String json = driver.findElement(By.cssSelector("a[href$='/devices']")).getText()
I am trying to get a WebElement with Selenium:
driver.findElement(By.xpath("//input[#name='j_username']"))
But Selenium says: "Unable to find element with XPath ...".
The XPath is valid, I proofed it with FirePath.
But the input element has the following invalid code:
<input size="10" type="text" name="j_username" maxlength="8">
I can't change the html-file, despite the fact is there any solution to get the webElement?
Thanks in advance!
try select element with css selector. and also verify in firepath(firebug addon that element is located properly).
so your css selector be something like
input[name='j_username']
2nd approach is to use internal firebug mechanism for finding xPaths of elements.
See screen attached below
After these manipulations driver shoulda handle element properly.
Well I will suggest adding an id to your html code -
<input id="j_username"size="10" type="text" name="j_username" maxlength="8">
and findElement by id -
driver.findElement(By.id("j_username"));
I have faced similar issues with xpath(borwser issues??) but id never fails for me. ;)
By the way I feel your code should be -
driver.findElement(By.xpath(".//*[#name='j_username']"));
The best solution is to find out what selenium is doing wrong, but without a URL or sample page to test on it's a little hard. Is there anyway you could dump the HTML into a jsfiddle? If there is do that and paste the url into the question and I'm sure someone can find a solution.
If not however, another way to get the results is to do it with jQuery. If firebug is picking it up but not selenium, then there's no reason why jQuery wouldn't get it. Here's how to go about doing that if needed:
Step 1: Is jQuery already present on the page? If so then you don't need to do this bit, otherwise you will need to add it yourself by using driver.executeScript(addjQueryScript) where the script does something like this.
Step 2: call WebElement input = driver.executeScript(elementSelector); where the elementSelector script would be something like \"return $('input[name=\"j_username\"]')\");
My jQuery's not so good, but I believe that should work...
Best of luck!
If I have HTML that looks like:
<td class="blah">&nbs;???? </td>
Could I get the ???? value using xpath?
What would it look like?
To use XPath you usually need XML not HTML, but some parsers (e.g. the one built into PHP) have a relaxed Mode which will parse most HTML, too.
If you want to find all <a> that are direct children of <td class="blah"> the XPath you need is
//td[#class = 'blah']/a
or
//td[#class = 'blah']/a[#href = 'http://...']
(depending on whether you only want the one url or all urls)
This will give you a Set of Nodes. You'll need to iterate through it and then check for the nodeType of the firstChild (supposed to be a text node) and the number of child nodes (supposed to be 1). Then the firstChild will contain the ????
Why would you use an XML parser to parse HTML?
I would suggest using a dedicated Java HTML parser, there are many, but I haven't tried any myself.
As for your question, would it work, I suspect it will not work, you will get an error when trying to parse it as HTML right at &nbs; if not earlier.