Price extraction in java

Price extraction in java - java

I am trying to create a discord bot that searches up an item inputted by user "!price item" and then gives me a price that I can work with later on in the code. I figured out how to get the html code into a string or a doc file, but I am struggling on finding a way to extract only prices.
Here is the code:
#Override
public void onMessageReceived(MessageReceivedEvent event) {
String html;
System.out.println("I received a message from " +
event.getAuthor().getName() + ": " +
event.getMessage().getContentDisplay());
if (event.getMessage().getContentRaw().contains("!price")) {
String input = event.getMessage().getContentDisplay();
String item = input.substring(9).replaceAll(" ", "%20");
String URL = "https://www.google.lt/search?q=" + item + "%20price";
try {
html = Jsoup.connect(URL).userAgent("Mozilla/49.0").get().html();
html = html.replaceAll("[^\\ ,.£€eur0123456789]"," ");
} catch (Exception e) {
return;
}
System.out.println(html);
}
}
The biggest problem is that I am using google search so the prices are not in the same place in the html code. Is there a way I can extract only (numbers + EUR) or (a euro sign + price) from the html code?.

you can easily do that scrapping the website. Here's a simple working example to do what you are looking for using JSOUP:
public class Main {
public static void main(String[] args) {
try {
String query = "oneplus";
String url = "https://www.google.com/search?q=" + query + "%20price&client=firefox-b&source=lnms&tbm=shop&sa=X";
int pricesToRetrieve = 3;
ArrayList<String> prices = new ArrayList<String>();
Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
Elements elements = document.select("div.pslires");
for (Element element : elements) {
String price = element.select("div > div > b").text();
String[] finalPrice = price.split(" ");
prices.add(finalPrice[0] + finalPrice[1]);
pricesToRetrieve -= 1;
if (pricesToRetrieve == 0) {
break;
}
}
System.out.println(prices);
} catch (IOException e) {
e.printStackTrace();
}
}
}
That piece of code will output:
[347,10€, 529,90€, 449,99€]
And if you want to retrieve more information just connect JSOUP to the Google Shop url adding your desired query, and scrapping it using JSOUP. In this case I scrapped Google Shop for OnePlus to check its prices, but you can also get the url to buy it, the full product name, etc. In this piece of code I want to retrieve the first 3 prices indexed in Google Shop and add them to an ArrayList of String. Then before adding it to the ArrayList I split the retrieved text by "space" so I just get the information I want, the price.
This is a simple scrapping example, if you need anything else feel free to ask! And if you want to learn more about scrapping using JSOUP check this link.
Hope this helped you!

Related

How to fetch all the details that are inside the td tag one by one by using HtmlUnit?

so far I have successfully grab some of the details from the sub-categories but that too is not enough. I've to grab each and every details as:
Product Name :Shoes
Product Details :Shoes
Date :12-06-2020
Price :Rs. 2,500
(Brand New)
Here Product Name and Product Details gives same result but which is not the required result I want.I want something like this:
Product Name :Shoes
Product Details :Brand New Shoes Highcopy ...
Seller :s unil t
Date :12-06-2020
Price :Rs. 2,500
Usage :Brand New
The Image for this particular item is
for your convenience this is the site I'm scraping
https://hamrobazaar.com/c6-apparels-and-accessories
The Code is as
public static void main(String[] args) throws IOException {
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log",
"org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
final String url = "https://hamrobazaar.com/c6-apparels-and-accessories"
WebClient webClient = new WebClient(BrowserVersion.FIREFOX);
HtmlPage rootPage = webClient.getPage(new URL(url));
List<HtmlTable> tableElements = rootPage
.getByXPath("/html/body/table/tbody/tr[2]/td/table/tbody/tr[1]/td/table[5]/tbody/tr/td[2]/table");
try{
for (int i = 0; i < tableElements.size(); i++) {
if (i == 0 ||
i == 1) {
continue;
}
HtmlTableRow row = tableElements.get(i).getRow(0);
HtmlTableCell productCell = row.getCell(2);
if(productCell.getElementsByTagName("a").get(0).asText().equals(null)) {
continue;
} else {
String productName = productCell.getElementsByTagName("a").get(0).asText();
System.out.println("Product Name :"+productName);
}
System.out.println("Product Details :" + productCell.getElementsByTagName("font").get(0).asText() );
System.out.println("Date :" + row.getCell(3).asText());
System.out.println("Price :" + row.getCell(4).asText());
}
} catch (Exception e) {
System.out.println("Exception raised");
}
}
Please, help me to sort out this. Tons of Thanks

Just did a quick check with the page by using the FF developer tools to check the dom tree.
Your code searches inside the font tag for the product details
System.out.println("Product Details :" + productCell.getElementsByTagName("font").get(0).asText() );
but as far as i can see the text you are looking for is outside the font tag.
I think you have to use the sibling of the sibling of the font tag.

Using jsoup for extracting price

I wanna get the price 9.99 from this page source.
https://www.walmart.com/ip/Terminator-Genisys-DVD/45863333?sourceid=api00ctd43f4bc7559f459fae574f62a0e9de01&affp1=%7Capk%7C&affilsrc=api&veh=aff&wmlspartner=readonlyapi
The code I am using is
public String doubleCheckPrice(String html, IDoubleCheckable availability) throws URISyntaxException, IOException{
Document doc = Jsoup.parse(html);
String price = null;
for(Element meta : doc.select("div")) {
if((meta.attr("itemprop") != null) && (meta.attr("itemprop").equals("price"))) {
price = meta.text();
price = price.replace("$", "").trim();
logger.debug("Extracted price via double check {} for availability {}", price, availability.getUrl());
}
}
if(price == null) {
Elements elements = doc.select(".js-price-display");
if(elements != null && elements.size() > 0) {
price = elements.get(0).text();
price = price.replace("$", "").trim();
}
}
return price;
}
But I am getting null. Any help will be appreciated.
Thanks

I think you should use Walmart's API for this purpose. That is the best way.
Alternatively, if you cannot use an API, you should use a framework for this. Have a look at it https://jsoup.org/
This framework will allow you to create a structured document and help you to iterate tags, classes or IDs. You can then use findElementsById to fetch the data. Have a look at the examples of the site.

I got the solution for this.Here it is
for(Element meta : doc.select(".Price-group")) {
if(meta.attr("aria-label")!=null)
{
System.out.println(meta.attr("aria-label"));
price=meta.text();
price = price.replace("$", "").trim();
logger.debug("Extracted price via double check {} for availability {}", price, availability.getUrl());
}

Here is the solution
Elements priceElms=document.select(".prod-BotRow.prod-showBottomBorder.prod-OfferSection .prod-PriceHero .Price-group");
if(priceElms.size() > 0){
String price=priceElms.get(0).text();
price=price.replace("$","");
}
No need to loop to get the values, Just select the appropriate field you want and use Jsoup selectors.
Thanks

Selecting elements by class with Jsoup

Hi I am trying to parse data from yahoo finance using Jsoup in Eclipse by selecting elements by their class with the below code.
This method has worked for me with other website but will not work here. The attached link is the page I'm trying to parse. In this example the line I'm trying to parse 21.74 specifically I want to parse out the "21.74". I have tried selecting table elements but nothing seems to work. This is my first question so any suggestions are mush appreciated!!
public static final String YAHOOLINK = new String("http://finance.yahoo.com/quote/MMM/key-statistics?p=");
private String yahooLink;
private Document rawYahooData;
private static String CLASSNAME = new String("W(100%) Pos(r)");
public YahooDataCollector(String aStockTicker){
yahooLink = new String(YAHOOLINK + aStockTicker);
try
{
rawYahooData = (Document) Jsoup.connect(yahooLink).timeout(10*1000).get();
Elements yahooElements = rawYahooData.getElementsByClass(CLASSNAME);
for(Element e : yahooElements)
{
System.out.println(e.text());
}
}
catch(IOException e)
{
System.out.println("Error Grabbing Raw Data For "+ aStockTicker);
}
}

Java - How do I extract Google News Titles and Links using Jsoup?

I am very new to using jsoup and html. I was wondering how to extract the titles and links (if possible) from the stories on the front page of google news. Here is my code:
org.jsoup.nodes.Document doc = null;
try {
doc = (org.jsoup.nodes.Document) Jsoup.connect("https://news.google.com/").get();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
Elements titles = doc.select("titletext");
System.out.println("Titles: " + titles.text());
//non existent
for (org.jsoup.nodes.Element e: titles) {
System.out.println("Title: " + e.text());
System.out.println("Link: " + e.attr("href"));
}
For some reason I think my program is unable to find titletext, since this is the output when the code runs: Titles:
I would really appreciate your help, thanks.

First get all nodes/elements which start with h2 html tag
Elements elem = html.select("h2");
Now you have element it has some child element(s) (id, href, originalhref and so on). Here you need retrieve these data which you need
for(Element e: elem){
System.out.println(e.select("[class=titletext]").text());
System.out.println(e.select("a").attr("href"));
}

Split a linked list nodes multiple data and show the resultant

I want to use a text file to populate a linked list using a Java SE app. The text file has the following format:
firstnamelastname mobile home office
I want to insert these lines as nodes in the linked list! then i want to search for a specific node from the linked list that has been populated from the text file !
using firstnamelastname as a key i want to compare it with the nodes data (data is to to splitted as to compare with the "key") after finding the specific node i want to split that nodes data using split(" "); and show the resultant !!!
i just want to know about hint doing it all I will be very thankful in advance please help me out !!!
i have generated a java source code but its not working as it always give me the last nodes data so check the blunder I made if its totally wrong give any of your idea!
try {
String fullname;
String mobile;
String home;
String mobile2;
String office;
Node current = first;
while (current.data != key) {
String splitter[] = current.data.split(" ");
fullname = splitter[0];
mobile = splitter[1];
home = splitter[2];
mobile2 = splitter[3];
office = splitter[4];
if (fullname == null ? key == null : fullname.equals(key)) {
mobilefield.setText(mobile);
homefield.setText(home);
mobilefield2.setText(mobile2);
officefield.setText(office);
} else {
throw new FileNotFoundException(
"SORRY RECORD NOT LISTED IN DATABASE");
}
break;
}
} catch (Exception e) {
JOptionPane.showMessageDialog(this, e.getMessage()
+ "\nPLEASE TRY AGAIN !", "Search Error",
JOptionPane.ERROR_MESSAGE);
}
(key is as=firstname+lastname;)

I would use an ArrayList (needs less memory, has better performance for most operations)
Code looks ok for me, but I would not not split the current.data for every entry.
I would use a test like
if(current.data.startsWith(key)){
String splitter[]=current.data.split(" ");
...
}
and only if this is true I would split the current.data because only in this case you need mobile *home* office

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Price extraction in java - java

Related

How to fetch all the details that are inside the td tag one by one by using HtmlUnit?

Using jsoup for extracting price

Selecting elements by class with Jsoup

Java - How do I extract Google News Titles and Links using Jsoup?

Split a linked list nodes multiple data and show the resultant

Categories

Resources