Get price from webpage using Jsoup - java

I'm trying to get the price from a product on a webpage.
Specifically from within the following html. I don't know how to use CSS but these are my attempts so far.
<div class="pd-price grid-100">
<!-- Selling Price -->
<div class="met-product-price v-spacing-small" data-met-type="regular">
<span class="primary-font jumbo strong art-pd-price">
<sup class="dollar-symbol" itemprop="PriceCurrency" content="USD">$</sup>
399.00</span>
<span itemprop="price" content="399.00"></span>
</div>
</div>
> $399.00
This obviously resides further within a webpage but here is the java code i've attempted to run this.
String url ="https://www.lowes.com/pd/GE-700-sq-ft-Window-Air-Conditioner-115-Volt-14000-BTU-ENERGY-STAR/1000380463";
Document document = Jsoup.connect(url).timeout(0).get();
String price = document.select("div.pd-price").text();
String title = document.title(); //Get title
System.out.println(" Title: " + title); //Print title.
System.out.println(price);

First you should familiarize yourself with CSS Selector
W3School
has some resource to get you started.
In this case, the thing you need resides inside div with pd-price class
so div.pd-price is already correct.
You need to get the element first.
Element outerDiv = document.selectFirst("div.pd-price");
And then get the child div with another selector
Element innerDiv = outerDiv.selectFirst("div.met-product-price");
And then get the span element inside it
Element spanElement = innerDiv.selectFirst("span.art-pd-price");
At this point you could get the <sup> element but in this case, you can just call text() method to get the text
System.out.println(spanElement.text());
This will print
$ 399.0
Edit:
After seeing comments in other answer
You can get cookie from your browser and send it from Jsoup to bypass the zipcode requirement
Document document = Jsoup.connect("https://www.lowes.com/pd/GE-700-sq-ft-Window-Air-Conditioner-115-Volt-14000-BTU-ENERGY-STAR/1000380463")
.header("Cookie", "<Your Cookie here>")
.get();

Element priceDiv = document.select("div.pd-price").first();
String price = priceDiv.select("span").last().attr("content");
If you need currency too:
String priceWithCurrency = priceDiv.select("sup").text();
I'm not run these, but should work.
For more detail see JSoup API reference

Related

WebDriver getting an element java

I'm accessing a website with webdriver, and I want to change an element in it.
I want to know how to access an element and change it. In this case I want to add text to an element i.e. transform this:
<span data-id="tag-dist-Lisboa" title="Para eliminar uma das opções, faça duplo clique.">
</span>
Into this:
<span data-id="tag-dist-Lisboa" title="Para eliminar uma das opções, faça duplo clique.">
Some random text
</span>
My main problem has been finding the element.
I have added above some of the HTML, if you need more information please say so.
As I have already posted one answer on your previous question-
Use following code to set value for the span tag-
int textLength = driver.findElement(By.xpath("//span[#data-id='tag-dist-Lisboa']")).getText().length();
if(textLength<=0)
{
WebElement element = driver.findElement(By.xpath("//span[#data-id='tag-dist-Lisboa']"));
JavascriptExecutor js= (JavascriptExecutor)driver;
js.executeScript("arguments[0].innerText = 'Your Text Here'", element);
}
Explanation :-
Find your required span tag with data-id attribute and check if it contains text or not If yes then use that text value as per your choice.
If not then set your required text for that span tag
Let us know is it as per your question or some other you want to do ?

Parsing data using Jsoup

I am trying to parse out job information from HTML page using Jsoup parser. I am trying to extract all the job posting details, however I just couldn't get the query right. I tried into Tryjsoup.com to get idea of query structure but I can't figure out how can I get these tuples and also please inform on how to get a grip on their inner structure
Html Code:
<div itemscope itemtype="http://schema.org/JobPosting" type="tuple" id="131015000050" class="row ">
<a count=1 href="some link">
<span itemprop=title><font class=hlite>Developer</font></span>
<span itemprop=hiringOrganization>Vm World</span>
</a>
</div>
<div class= "other details"><span itemprop=baseSalary><em></em>3000</span></div>
Expected Output:
String Post = Developer
String Company = Vm World
String Salary = 3000
I think you just need to use Element.select("span") for the block of HTML code.
Document doc = Jsoup.parse("<HTML code>");
Elements spans = doc.select("span");
for(Element span: spans) {
System.out.println(span.text());
}
The result of the above code:
Developer
Vm World
3000
Code for segregatiton:
Element title = doc.select("span[itemprop=title]").first();
Element post = doc.select("span[itemprop=hiringOrganization]").first();
Element salary = doc.select("span[itemprop=baseSalary]").first();
System.out.println(title.text());
System.out.println(post.text());
System.out.println(salary.text());

Get title attribute with jsoup

I have a problem with parsing a website.
The website contains a phrase like this:
<td class="school">
<abbr title data-original-title="Highschool">...</abbr>
</td>
How can I get the title (Highschool)?
I'm programming with jsoup and java.
Thanks for your help.
Just try reading jsoup cookbook.
First you should get abbr element, and then its data-original-title attribute:
Element abbrElement = doc.select("abbr").first();
String originalTitle = abbrElement.attr("data-original-title");
Of course you should make sure that you select right abbr element. Above code will select the first one appearing in the document.
This can be done relatively easy using jsoup's DOM methods or selection on a parsed document. Check out these links for reference:
DOM navigation
Extracting attributes
//assuming that the class "school" contains the tag for the title
Elements titles = doc.getElementsByClass("school").getElementsByTag("abbr");
for (Element t: titles) {
String title= t.attr("data-original-title");
//do something with the title
}

Get severals class same name with JSOUP

Is there a way to get HTML from severals class with same name with the plugin JSoup of Java ?
For example:
<div class="div_idalgo_content_result_date_match_local">
blablabla
</div>
<div class="div_idalgo_content_result_date_match_local">
123456789
</div>
I'd like get blablabla in one String and 123456789 in another.
I wish my question is understandable.
This can be done in several different ways.
If you want to select the div's with the class name above, you can simply use the following:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
This will give you a collection of Element that you can iterate over.
If you after that would like to select perhaps only the first one, you can use the :eq(0)-parameter, or the first()-parameter.
Element firstDiv = div.first();
OR
Elements div = doc.select("div.div_idalgo_content_result_date_match_local:eq(0)");
Note that the second method you are selecting from the document, while in the first method you select from the collection of Element's. You can of course also change the value of the :eq(0) to something else that matches your element. There are many useful selectors that you can use that I have included a link to in the end of the answer.
The following code will split your div's into two:
Elements div = doc.select("div.div_idalgo_content_result_date_match_local");
Element firstDiv = div.first();
Element secondDiv = div.get(1);
System.out.println("This is the first div: " + firstDiv.text());
System.out.println("This is the second div: " + secondDiv.text());
JSoup Cookbook - Selector syntax

Selenium: Extract Text of a div with cssSelector in Java

I am writing a JUnit test for a webpage, using Selenium, and I am trying to verify that the expected text exists within a page. The code of the webpage I am testing looks like this:
<div id="recipient_div_3" class="label_spacer">
<label class="nodisplay" for="Recipient_nickname"> recipient field: reqd info </label>
<span id="Recipient_nickname_div_2" class="required-field"> *</span>
Recipient:
</div>
I want to compare what is expected with what is on the page, so I want to use
Assert.assertTrue(). I know that to get everything from the div, I can do
String element = driver.findElement(By.cssSelector("div[id='recipient_div_3']")).getText().replaceAll("\n", " ");
but this will return "reqd info * Recipient:"
Is there any way to just get the text from the div ("Recipient") using cssSelector, without the other tags?
You can't do this with a CSS selector, because CSS selectors don't have a fine-grained enough approach to express "the text node contained in the DIV but not its other contents". You can do that with an XPath locator, though:
driver.findElement(By.xpath("//div[#id='recipient_div_3']/text()")).getText()
That XPath expression will identify just the single text node that is a direct child of the DIV, rather than all the text contained within it and its child nodes.
I am not sure if it is possible with one css locator, but you can get text from div, then get text from div's child nodes and subtract them. Something like that (code wasn't checked):
String temp = "";
List<WebElement> tempElements = driver.findElements(By.cssSelector("div[id='recipient_div_3'] *"));
for (WebElement tempElement : tempElements) {
temp =+ " " + tempElement.getText();
}
String element = driver.findElement(By.cssSelector("div[id='recipient_div_3']")).getText().replaceAll("\n", " ").replace(temp, "");
This is for case when you try to avoid using xpath. Xpath allows to do it:
//div[#id='recipient_div_3']/text()
You could also get the text content of an element and remove the tags with regexp. Also notice: you should use the reluctant quntifier
https://docs.oracle.com/javase/tutorial/essential/regex/quant.html
String getTextContentWithoutTags(WebElement element) {
return element.getText().replaceAll("<[^>]*?/>", "").trim();
}

Categories

Resources