Using jsoup for extracting price

Using jsoup for extracting price - java

I wanna get the price 9.99 from this page source.
https://www.walmart.com/ip/Terminator-Genisys-DVD/45863333?sourceid=api00ctd43f4bc7559f459fae574f62a0e9de01&affp1=%7Capk%7C&affilsrc=api&veh=aff&wmlspartner=readonlyapi
The code I am using is
public String doubleCheckPrice(String html, IDoubleCheckable availability) throws URISyntaxException, IOException{
Document doc = Jsoup.parse(html);
String price = null;
for(Element meta : doc.select("div")) {
if((meta.attr("itemprop") != null) && (meta.attr("itemprop").equals("price"))) {
price = meta.text();
price = price.replace("$", "").trim();
logger.debug("Extracted price via double check {} for availability {}", price, availability.getUrl());
}
}
if(price == null) {
Elements elements = doc.select(".js-price-display");
if(elements != null && elements.size() > 0) {
price = elements.get(0).text();
price = price.replace("$", "").trim();
}
}
return price;
}
But I am getting null. Any help will be appreciated.
Thanks

I think you should use Walmart's API for this purpose. That is the best way.
Alternatively, if you cannot use an API, you should use a framework for this. Have a look at it https://jsoup.org/
This framework will allow you to create a structured document and help you to iterate tags, classes or IDs. You can then use findElementsById to fetch the data. Have a look at the examples of the site.

I got the solution for this.Here it is
for(Element meta : doc.select(".Price-group")) {
if(meta.attr("aria-label")!=null)
{
System.out.println(meta.attr("aria-label"));
price=meta.text();
price = price.replace("$", "").trim();
logger.debug("Extracted price via double check {} for availability {}", price, availability.getUrl());
}

Here is the solution
Elements priceElms=document.select(".prod-BotRow.prod-showBottomBorder.prod-OfferSection .prod-PriceHero .Price-group");
if(priceElms.size() > 0){
String price=priceElms.get(0).text();
price=price.replace("$","");
}
No need to loop to get the values, Just select the appropriate field you want and use Jsoup selectors.
Thanks

Related

How to fetch all the details that are inside the td tag one by one by using HtmlUnit?

so far I have successfully grab some of the details from the sub-categories but that too is not enough. I've to grab each and every details as:
Product Name :Shoes
Product Details :Shoes
Date :12-06-2020
Price :Rs. 2,500
(Brand New)
Here Product Name and Product Details gives same result but which is not the required result I want.I want something like this:
Product Name :Shoes
Product Details :Brand New Shoes Highcopy ...
Seller :s unil t
Date :12-06-2020
Price :Rs. 2,500
Usage :Brand New
The Image for this particular item is
for your convenience this is the site I'm scraping
https://hamrobazaar.com/c6-apparels-and-accessories
The Code is as
public static void main(String[] args) throws IOException {
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log",
"org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
final String url = "https://hamrobazaar.com/c6-apparels-and-accessories"
WebClient webClient = new WebClient(BrowserVersion.FIREFOX);
HtmlPage rootPage = webClient.getPage(new URL(url));
List<HtmlTable> tableElements = rootPage
.getByXPath("/html/body/table/tbody/tr[2]/td/table/tbody/tr[1]/td/table[5]/tbody/tr/td[2]/table");
try{
for (int i = 0; i < tableElements.size(); i++) {
if (i == 0 ||
i == 1) {
continue;
}
HtmlTableRow row = tableElements.get(i).getRow(0);
HtmlTableCell productCell = row.getCell(2);
if(productCell.getElementsByTagName("a").get(0).asText().equals(null)) {
continue;
} else {
String productName = productCell.getElementsByTagName("a").get(0).asText();
System.out.println("Product Name :"+productName);
}
System.out.println("Product Details :" + productCell.getElementsByTagName("font").get(0).asText() );
System.out.println("Date :" + row.getCell(3).asText());
System.out.println("Price :" + row.getCell(4).asText());
}
} catch (Exception e) {
System.out.println("Exception raised");
}
}
Please, help me to sort out this. Tons of Thanks

Just did a quick check with the page by using the FF developer tools to check the dom tree.
Your code searches inside the font tag for the product details
System.out.println("Product Details :" + productCell.getElementsByTagName("font").get(0).asText() );
but as far as i can see the text you are looking for is outside the font tag.
I think you have to use the sibling of the sibling of the font tag.

Price extraction in java

I am trying to create a discord bot that searches up an item inputted by user "!price item" and then gives me a price that I can work with later on in the code. I figured out how to get the html code into a string or a doc file, but I am struggling on finding a way to extract only prices.
Here is the code:
#Override
public void onMessageReceived(MessageReceivedEvent event) {
String html;
System.out.println("I received a message from " +
event.getAuthor().getName() + ": " +
event.getMessage().getContentDisplay());
if (event.getMessage().getContentRaw().contains("!price")) {
String input = event.getMessage().getContentDisplay();
String item = input.substring(9).replaceAll(" ", "%20");
String URL = "https://www.google.lt/search?q=" + item + "%20price";
try {
html = Jsoup.connect(URL).userAgent("Mozilla/49.0").get().html();
html = html.replaceAll("[^\\ ,.£€eur0123456789]"," ");
} catch (Exception e) {
return;
}
System.out.println(html);
}
}
The biggest problem is that I am using google search so the prices are not in the same place in the html code. Is there a way I can extract only (numbers + EUR) or (a euro sign + price) from the html code?.

you can easily do that scrapping the website. Here's a simple working example to do what you are looking for using JSOUP:
public class Main {
public static void main(String[] args) {
try {
String query = "oneplus";
String url = "https://www.google.com/search?q=" + query + "%20price&client=firefox-b&source=lnms&tbm=shop&sa=X";
int pricesToRetrieve = 3;
ArrayList<String> prices = new ArrayList<String>();
Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
Elements elements = document.select("div.pslires");
for (Element element : elements) {
String price = element.select("div > div > b").text();
String[] finalPrice = price.split(" ");
prices.add(finalPrice[0] + finalPrice[1]);
pricesToRetrieve -= 1;
if (pricesToRetrieve == 0) {
break;
}
}
System.out.println(prices);
} catch (IOException e) {
e.printStackTrace();
}
}
}
That piece of code will output:
[347,10€, 529,90€, 449,99€]
And if you want to retrieve more information just connect JSOUP to the Google Shop url adding your desired query, and scrapping it using JSOUP. In this case I scrapped Google Shop for OnePlus to check its prices, but you can also get the url to buy it, the full product name, etc. In this piece of code I want to retrieve the first 3 prices indexed in Google Shop and add them to an ArrayList of String. Then before adding it to the ArrayList I split the retrieved text by "space" so I just get the information I want, the price.
This is a simple scrapping example, if you need anything else feel free to ask! And if you want to learn more about scrapping using JSOUP check this link.
Hope this helped you!

How to use Jsoup to get href link without the extra characters?

I have an Element list of which i'm using jsoup's method attr() to get the href attribute.
Here is part of my code:
String searchTerm = "tutorial+programming+"+i_SearchPhrase;
int num = 10;
String searchURL = GOOGLE_SEARCH_URL + "?q="+searchTerm+"&num="+num;
Document doc = Jsoup.connect(searchURL).userAgent("chrome/5.0").get();
Elements results = doc.select("h3.r > a");
String linkHref;
for (Element result : results) {
linkHref = result.attr("href").replace("/url?q=","");
//some more unrelated code...
}
So for example, when i use the search prase "test", the attr("href") produces (first in the list):
linkHref = https://www.tutorialspoint.com/software_testing/&sa=U&ved=0ahUKEwi_lI-T69jTAhXIbxQKHU1kBlAQFggTMAA&usg=AFQjCNHr6EzeYegPDdpHJndLJ-889Sj3EQ
where i only want: https://www.tutorialspoint.com/software_testing/
What is the best way to fix this? Do i just add some string operations on linkHref (which i know how) or is there a way to make the href attribute contain the shorter link to begin with?
Thank you in advanced

If you always want to remove the query parameters you can make use of String.indexOf() e.g.
int lastPos;
if(linkHref.indexOf("?") > 0) {
lastPos = linkHref.indexOf("?");
} else if (linkHref.indexOf("&") > 0){
lastPos = linkHref.indexOf("&");
}
else lastPos = -1;
if(lastPos != -1)
linkHref = linkHref.subsring(0, lastPos);

Read a specified line of text from a webpage with Jsoup

So I am trying to get the data from this webpage using Jsoup...
I've tried looking up many different ways of doing it and I've gotten close but I don't know how to find tags for certain stats (Attack, Strength, Defence, etc.)
So let's say for examples sake I wanted to print out
'Attack', '15', '99', '200,000,000'
How should I go about doing this?

You can use CSS selectors in Jsoup to easily extract the column data.
// retrieve page source code
Document doc = Jsoup
.connect("http://services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=Lynx%A0Titan")
.get();
// find all of the table rows
Elements rows = doc.select("div#contentHiscores table tr");
ListIterator<Element> itr = rows.listIterator();
// loop over each row
while (itr.hasNext()) {
Element row = itr.next();
// does the second col contain the word attack?
if (row.select("td:nth-child(2) a:contains(attack)").first() != null) {
// if so, assign each sibling col to variable
String rank = row.select("td:nth-child(3)").text();
String level = row.select("td:nth-child(4)").text();
String xp = row.select("td:nth-child(5)").text();
System.out.printf("rank=%s level=%s xp=%s", rank, level, xp);
// stop looping rows, found attack
break;
}
}

A very rough implementation would be as below. I have just shown a snippet , optimizations or other conditionals need to be added
public static void main(String[] args) throws Exception {
Document doc = Jsoup
.connect("http://services.runescape.com/m=hiscore_oldschool/hiscorepersonal.ws?user1=Lynx%A0Titan")
.get();
Element contentHiscoresDiv = doc.getElementById("contentHiscores");
Element table = contentHiscoresDiv.child(0);
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (Element column : tds) {
if (column.children() != null && column.children().size() > 0) {
Element anchorTag = column.getElementsByTag("a").first();
if (anchorTag != null && anchorTag.text().contains("Attack")) {
System.out.println(anchorTag.text());
Elements attributeSiblings = column.siblingElements();
for (Element attributeSibling : attributeSiblings) {
System.out.println(attributeSibling.text());
}
}
}
}
}
}
Attack
15
99
200,000,000

Split a linked list nodes multiple data and show the resultant

I want to use a text file to populate a linked list using a Java SE app. The text file has the following format:
firstnamelastname mobile home office
I want to insert these lines as nodes in the linked list! then i want to search for a specific node from the linked list that has been populated from the text file !
using firstnamelastname as a key i want to compare it with the nodes data (data is to to splitted as to compare with the "key") after finding the specific node i want to split that nodes data using split(" "); and show the resultant !!!
i just want to know about hint doing it all I will be very thankful in advance please help me out !!!
i have generated a java source code but its not working as it always give me the last nodes data so check the blunder I made if its totally wrong give any of your idea!
try {
String fullname;
String mobile;
String home;
String mobile2;
String office;
Node current = first;
while (current.data != key) {
String splitter[] = current.data.split(" ");
fullname = splitter[0];
mobile = splitter[1];
home = splitter[2];
mobile2 = splitter[3];
office = splitter[4];
if (fullname == null ? key == null : fullname.equals(key)) {
mobilefield.setText(mobile);
homefield.setText(home);
mobilefield2.setText(mobile2);
officefield.setText(office);
} else {
throw new FileNotFoundException(
"SORRY RECORD NOT LISTED IN DATABASE");
}
break;
}
} catch (Exception e) {
JOptionPane.showMessageDialog(this, e.getMessage()
+ "\nPLEASE TRY AGAIN !", "Search Error",
JOptionPane.ERROR_MESSAGE);
}
(key is as=firstname+lastname;)

I would use an ArrayList (needs less memory, has better performance for most operations)
Code looks ok for me, but I would not not split the current.data for every entry.
I would use a test like
if(current.data.startsWith(key)){
String splitter[]=current.data.split(" ");
...
}
and only if this is true I would split the current.data because only in this case you need mobile *home* office

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Using jsoup for extracting price - java

Related

How to fetch all the details that are inside the td tag one by one by using HtmlUnit?

Price extraction in java

How to use Jsoup to get href link without the extra characters?

Read a specified line of text from a webpage with Jsoup

Split a linked list nodes multiple data and show the resultant

Categories

Resources