I'm trying to get the content from a website, that uses "onClick" instead of "href" in hyperlinks, so the url is always the same, despite of the page you are seeing.
http://www.sas.ul.pt/index.php
This is the website, and the content i'm trying to get is inside "Alimentação" > "Estudantes".
Estudantes
Is this possible with Jsoup?
Jsoup.connect(url).data("nav", "index#4;02", "opt", "4;02", "chvP", "127").post();
You can get the value of onclick with jsoup
http://jsoup.org/cookbook/extracting-data/attributes-text-html
Just replace the line
String linkHref = link.attr("href");
with this
String handler = link.attr("onclick");
However after that there is no way to construct the URL unless you can somehow map the magic number to 4,02
Related
I have the following html tag and I want to receive "name":"test_1476979972086" from my Java Selenium code.
How can I achive this?
I already tried getText and getAttribute function but without any success.
<a data-ng-href="#/devices"
target="_blank"
class="ng-binding"
href="#/devices">
{"name":"test_1476979972086"}
</a>
getText() is always emtpy. The xpath is unique. newDevice.created is unique on page.
final By successMessageBy = By.xpath("//p[#data-ng-show='newDevice.created']/a");
final WebElement successMessage = wait.until(ExpectedConditions.presenceOfElementLocated(successMessageBy));
final String msg = successMessage.getText();
Actually WebElement#getText() returns only visible text. It could be possible element is present there but text would be visible later.
So if WebElement#getText() doesn't work as expected, you should try using getAttribute("textContent") as below:-
successMessage.getAttribute("textContent");
upon first glance, the below should work. the fact that what you've tried doesnt work, leads me to believe that you aren't selecting the correct element. since i am ignorant of the rest of your html, this might not be unique. you'll have to play around with it, or share the surrounding html
String json = driver.findElement(By.cssSelector("a[href$='/devices']")).getText()
I'm trying to select tag <div class=kcm-read-text> in this web.
Jsoup can get the text inside that tag. But when I want to get the text inside tag <div class=kcm-read-text> from here it return null. I don't know why. I'm trying to figure it out but still don't know the reason.
This is my code
Document dok = Jsoup.connect(URL).timeout(0).get();
Element isiBerita = dok.select("div.kcm-read-text").first();
I also try to use this but return the same result:
Element isiBerita = dok.select("div[class~=kcm-read-text]").first();`
Though both have the same html format, just have different contents.
Thanks for your help before.
Response provided by saka1029
Change user agent like Jsoup.connect(URL).userAgent("Mozilla/5.0").timeout(...
So im trying to use HtmlUnit to go to a URL but once you visit that url it downloads a json file regarding the data you want. Not sure how to word this but basically in HtmlUnit how can I get the result from a downloaded file.
I suck at explaining here look
trying to check user availability by this
private static final String URL = "https://twitter.com/users/username_available?username=";
...
HtmlPage page = webClient.getPage(URL + users[finalUsersIndex]);
so that basically creates a new page for each username thing is the URL + username returns a json file of user availability. I know how to read the json file but the problem is this
java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage
cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage
I get that on this line
HtmlPage page = webClient.getPage(URL + users[finalUsersIndex]);
I suppose I need to create a new page for the response but how would I do that since it automatically downloads file instead of per se, clicking a button which downloads the file. (Correct me if im wrong)
Sorry 4AM
As its name indicates, an HtmlPage is a page containing HTML. JSON is not HTML.
As the documentation indicates;
The DefaultPageCreator will create a Page depending on the content type of the HTTP response, basically HtmlPage for HTML content, XmlPage for XML content, TextPage for other text content and UnexpectedPage for anything else.
(emphasis mine).
So, as the exception you're getting indicates, the behavior you observed is the documented behavior: you're getting a page that is neither HTML, nor XML, nor text, so you get an UnexpectedPage.
Your code should thus be:
UnexpectedPage page = webClient.getPage(URL + users[finalUsersIndex]);
I am new to jsoup. I want to parse html but the problem is with the URL which we have to specify in jsoup.connect(url), I will get this url in response from some other page at runtime. Is there any way to pass the received url into jsoup.connect? I had read something like:
String html = response.getContentAsString();
Document document = Jsoup.parse(html);
But I am not getting exactly how to use it. I would love to know if some other way of doing this is better than jsoup.
"parse" function accepts html content string, not url.
According to jsoup javadoc, the following should solve your problem:
Document doc = Jsoup.connect("http://example.com").get();
pay attention to the fact that "connect" method returns Connection object, but, in fact, does not connect. Therefore additional call to "get" (or "put", depending on the handler type on the server side).
I have to get a link (that normally opens a new window on click) to open in the current window. My intent was to take the href attribute and just navigate to the url, but the link has no href attribute, it only has it's id and a class.
ex. <a id="thisLink" class="linkOut">someLinkText</a>
I only noticed when I tried to get the href attribute and received null. Is there a way to get the resulting url without opening the link or a way to open this link in the current window instead of a new one?
I'm testing the site through selenium webdriver and need to check the resulting page without opening it in a new window.
What exactly do you intend to do with a known-url#id url ? If you dont have a specific need, you can just keep known-url...
Anyway, for redirecting you can use window.location.href = new-url. And to get the url, you either take the href or build the url yourself with the id of the anchor.
-- Edit --
I see your last update of your question's description... it seems this is much more a selenium "howto" question.
There is most likely a click event on that anchor which opens the new page. Look for it in any javascript code, it most likely finds the anchor with the id so try looking for that.
If you have jQuery try seeing it like this $._data($("#thisLink")[0], "events" ).click;
What is probably happening:
var link = document.getElementById('thisLink');
link.onclick = function() {
window.open('https://www.google.com');
};
What you are wanting to happen:
var link = document.getElementById('thisLink');
link.onclick = function() {
window.location.href = "https://www.google.com";
};
You need to find where the onclick handler is for your anchor tag and if it is using window.open you need to replace it with window.location.href.