Get more than one Element JSoup Java Android

Get more than one Element JSoup Java Android - java

I am trying to get a list of items to form a playlist, and I am only able to retrieve one of the items. Here is the code I have going to my recyclerview's bindView:
#Override
public void onBindViewHolder(PlaylistViewHolder holder, int position)
{
try
{
String url = "https://www.c895.org/playlist";
Document document = Jsoup.connect(url).get();
Element playlist = document.select("#playlist").first();
List<TrackInfo> tracks = new ArrayList<>();
for(Element track : playlist.children())
{
long time = Long.parseLong(track.dataset().get("ts"));
String title = track.select(".title").text();
String artist = track.select(".artist").text();
tracks.add(new TrackInfo(new Date(time * 1000), title, artist));
}
for(int i = 0; i < tracks.size() - 1; i++)
{
holder.titlesView.setText(tracks.get(i).toString());
}
}
catch(IOException e)
{
e.printStackTrace();
}
}
Ideally I'd like to get about 10-20 results. Is there anyway I could do this?

It's because the html part that you need is in the following tag:
<div id="playlist">
</div>
So you can't use the following:
Element playlist = document.select("#playlist").first();
but you need to use div#playlist to get all the playlist item:
Element playlist = document.select("div#playlist");

Related

Getting Stale Element Reference Exception while accessing the element from a list

Below is the code which I am trying to get to work.
//Method to fetch all links from the sitemap container
public void GetAllLinks() {
WebElement pointer = LinksContainer;
String url = "";
List <WebElement> allURLs = pointer.findElements(By.tagName("a"));
System.out.println("Total links on the page: "+ allURLs.size());
for (int i=0; i<allURLs.size(); i++) {
WebElement link = allURLs.get(i);
url = link.getAttribute("href");
OpenAllLinks(url);
}
}
//Method to hit all the fetched URLs
public void OpenAllLinks(String linkURL) {
driver.get(linkURL);
}
I am fetching all the anchor elements from a sitemap page and then putting all those elements into a list. Then, I am getting the URLs from all those elements using the getAttribute(href). The code is working fine till here.
However, after that I am taking these URLs as arguments and passing into the method OpenAllLinks() to open all these URLs one by one using driver.get(). The code works till the first link but as soon as the first page is loaded, it gives me the stale element exception.

At the moment you are leaving the page where all these links are appearing all the web elements in allURLs list becoming stale elements.
What you can do is first to extract and save in a list all the links, not the web elements, and after that iterate with loop opening all these links.
Like this:
public void GetAllLinks() {
WebElement pointer = LinksContainer;
String url = "";
List <WebElement> allURLs = pointer.findElements(By.tagName("a"));
System.out.println("Total links on the page: "+ allURLs.size());
List<String>links = new ArrayList<>();
for (int i=0; i<allURLs.size(); i++) {
WebElement link = allURLs.get(i);
url = link.getAttribute("href");
links.add(url);
}
for (int i=0; i<links.size(); i++) {
OpenLink(links.get(i));
}
}
//Method to open the fetched URLs
public void OpenLink(String linkURL) {
driver.get(linkURL);
}

Price extraction in java

I am trying to create a discord bot that searches up an item inputted by user "!price item" and then gives me a price that I can work with later on in the code. I figured out how to get the html code into a string or a doc file, but I am struggling on finding a way to extract only prices.
Here is the code:
#Override
public void onMessageReceived(MessageReceivedEvent event) {
String html;
System.out.println("I received a message from " +
event.getAuthor().getName() + ": " +
event.getMessage().getContentDisplay());
if (event.getMessage().getContentRaw().contains("!price")) {
String input = event.getMessage().getContentDisplay();
String item = input.substring(9).replaceAll(" ", "%20");
String URL = "https://www.google.lt/search?q=" + item + "%20price";
try {
html = Jsoup.connect(URL).userAgent("Mozilla/49.0").get().html();
html = html.replaceAll("[^\\ ,.£€eur0123456789]"," ");
} catch (Exception e) {
return;
}
System.out.println(html);
}
}
The biggest problem is that I am using google search so the prices are not in the same place in the html code. Is there a way I can extract only (numbers + EUR) or (a euro sign + price) from the html code?.

you can easily do that scrapping the website. Here's a simple working example to do what you are looking for using JSOUP:
public class Main {
public static void main(String[] args) {
try {
String query = "oneplus";
String url = "https://www.google.com/search?q=" + query + "%20price&client=firefox-b&source=lnms&tbm=shop&sa=X";
int pricesToRetrieve = 3;
ArrayList<String> prices = new ArrayList<String>();
Document document = Jsoup.connect(url).userAgent("Mozilla/5.0").get();
Elements elements = document.select("div.pslires");
for (Element element : elements) {
String price = element.select("div > div > b").text();
String[] finalPrice = price.split(" ");
prices.add(finalPrice[0] + finalPrice[1]);
pricesToRetrieve -= 1;
if (pricesToRetrieve == 0) {
break;
}
}
System.out.println(prices);
} catch (IOException e) {
e.printStackTrace();
}
}
}
That piece of code will output:
[347,10€, 529,90€, 449,99€]
And if you want to retrieve more information just connect JSOUP to the Google Shop url adding your desired query, and scrapping it using JSOUP. In this case I scrapped Google Shop for OnePlus to check its prices, but you can also get the url to buy it, the full product name, etc. In this piece of code I want to retrieve the first 3 prices indexed in Google Shop and add them to an ArrayList of String. Then before adding it to the ArrayList I split the retrieved text by "space" so I just get the information I want, the price.
This is a simple scrapping example, if you need anything else feel free to ask! And if you want to learn more about scrapping using JSOUP check this link.
Hope this helped you!

How to loop inside containers to select value in selenium?

I have a product page which has the sizes inside containers, i tried to list elements and get size by text but the list always returns zero, i tried the xpath of the parent and child and i get the same error, How can i list the sizes and select specific size ?
public void chooseSize(String size) {
String selectedSize;
List<WebElement> sizesList = actions.driver.findElements(By.xpath("SelectSizeLoactor"));
try {
for (int i = 0; i <= sizesList.size(); i++) {
if (sizesList.get(i).getText().toLowerCase().contains(size.toLowerCase()));
{
selectedSize = sizesList.get(i).getText();
sizesList.get(i).click();
assertTrue(selectedSize.equals(size));
}
}
} catch (Exception e) {
Assert.fail("Couldn't select size cause of " + e.getMessage());
}

It looks to me like the proper selector would be:
actions.driver.findElements(By.cssSelector(".SizeSelection-option"))

Try below options
List<WebElement> sizesList = actions.driver.findElements(By.xpath("//[#class='SelectSizeLoactor']"));
List<WebElement> sizesList = actions.driver.findElements(By.cssSelector(".SelectSizeLoactor"));

I found a quick solution i used part of the xpath with text() and passed the value of that text later then added the last of the xpath and it worked!
String SelectSizeLoactor = "//button[text()='"
public void chooseSize(String size) {
String selectedSize;
WebElement sizeLocator = actions.driver.findElement(By.xpath(SelectSizeLoactor+size.toUpperCase()+"']"));
try {
if (sizeLocator.getText().toUpperCase().contains(size.toUpperCase()));
{
selectedSize = sizeLocator.getText();
sizeLocator.click();
assertTrue(selectedSize.equals(size));
}
} catch (Exception e) {
Assert.fail("Couldn't select size cause of " + e.getMessage());
}
}

Unable to parse element attribute with XOM

I'm attempting to parse an RSS field using the XOM Java library. Each entry's image URL is stored as an attribute for the <img> element, as seen below.
<rss version="2.0">
<channel>
<item>
<title>Decision Paralysis</title>
<link>https://xkcd.com/1801/</link>
<description>
<img src="https://imgs.xkcd.com/comics/decision_paralysis.png"/>
</description>
<pubDate>Mon, 20 Feb 2017 05:00:00 -0000</pubDate>
<guid>https://xkcd.com/1801/</guid>
</item>
</channel>
</rss>
Attempting to parse <img src=""> with .getFirstChildElement("img") only returns a null pointer, making my code crash when I try to retrieve <img src= ...>. Why is my program failing to read in the <img> element, and how can I read it in properly?
import nu.xom.*;
public class RSSParser {
public static void main() {
try {
Builder parser = new Builder();
Document doc = parser.build ( "https://xkcd.com/rss.xml" );
Element rootElement = doc.getRootElement();
Element channelElement = rootElement.getFirstChildElement("channel");
Elements itemList = channelElement.getChildElements("item");
// Iterate through itemList
for (int i = 0; i < itemList.size(); i++) {
Element item = itemList.get(i);
Element descElement = item.getFirstChildElement("description");
Element imgElement = descElement.getFirstChildElement("img");
// Crashes with NullPointerException
String imgSrc = imgElement.getAttributeValue("src");
}
}
catch (Exception error) {
error.printStackTrace();
System.exit(1);
}
}
}

There is no img element in the item. Try
if (imgElement != null) {
String imgSrc = imgElement.getAttributeValue("src");
}
What the item contains is this:
<description><img
src="http://imgs.xkcd.com/comics/us_state_names.png"
title="Technically DC isn't a state, but no one is too
pedantic about it because they don't want to disturb the snakes
."
alt="Technically DC isn't a state, but no one is too pedantic about it because they don't want to disturb the snakes." />
</description>
That's not an img elment. It's plain text.

I managed to come up with a somewhat hacky solution using regex and pattern matching.
// Iterate through itemList
for (int i = 0; i < itemList.size(); i++) {
Element item = itemList.get(i);
String descString = item.getFirstChildElement("description").getValue();
// Parse image URL (hacky)
String imgSrc = "";
Pattern pattern = Pattern.compile("src=\"[^\"]*\"");
Matcher matcher = pattern.matcher(descString);
if (matcher.find()) {
imgSrc = descString.substring( matcher.start()+5, matcher.end()-1 );
}
}

java android very large xml parsing

I have got a very large xml file with categories in one xml file which maps to sub categories in another xml file according to category id. The xml file with only category id and names is loading fast, but the xml file which has subcategories with images path, description, latitude-longitude etc...is taking time to load.
I am using javax.xml package and org.w3c.dom package.
The list action is loading the file in each click to look for subcategories.
Is there any way to make this whole process faster?
Edit-1
Heres the code i am using to getch subcategories:
Document doc = this.builder.parse(inStream, null);
doc.getDocumentElement().normalize();
NodeList pageList = doc.getElementsByTagName("page");
final int length = pageList.getLength();
for (int i = 0; i < length; i++)
{
boolean inCategory = false;
Element categories = (Element) getChild(pageList.item(i), "categories");
if(categories != null)
{
NodeList categoryList = categories.getElementsByTagName("category");
for(int j = 0; j < categoryList.getLength(); j++)
{
if(Integer.parseInt(categoryList.item(j).getTextContent()) == catID)
{
inCategory = true;
break;
}
}
}
if(inCategory == true)
{
final NamedNodeMap attr = pageList.item(i).getAttributes();
//
//get Page ID
final int categoryID = Integer.parseInt(getNodeValue(attr, "id"));
//get Page Name
final String categoryName = (getChild(pageList.item(i), "title") != null) ? getChild(pageList.item(i), "title").getTextContent() : "Untitled";
//get ThumbNail
final NamedNodeMap thumb_attr = getChild(pageList.item(i), "thumbnail").getAttributes();
final String categoryImage = "placethumbs/" + getNodeValue(thumb_attr, "file");
//final String categoryImage = "androidicon.png";
Category category = new Category(categoryName, categoryID, categoryImage);
this.list.add(category);
Log.d(tag, category.toString());
}
}

Use SAX based parser, DOM is not good for large xml.

Maybe a SAX processor would be quicker (assuming your App is slowing down due to memory requirements of using a DOM-style approach?)
Article on processing XML on android
SOF question about SAX processing on Android

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get more than one Element JSoup Java Android - java

It's because the html part that you need is in the following tag: <div id="playlist"> </div> So you can't use the following: Element playlist = document.select("#playlist").first(); but you need to use div#playlist to get all the playlist item: Element playlist = document.select("div#playlist");

Related

Getting Stale Element Reference Exception while accessing the element from a list

Price extraction in java

How to loop inside containers to select value in selenium?

Unable to parse element attribute with XOM

java android very large xml parsing

Categories

Resources