Jsoup webscraping to find game odds data

Jsoup webscraping to find game odds data - java

I've been trying to create a program in Java that can catch the odds of a game from a sportsbook like FanDuel but I've been running into a lot of problems. When I print the html for the site I dont get the entire html for the site so Im unable to go into the divs and retrieve the actual data I want.
I used the Url https://sportsbook.fanduel.com/ . If I try and run a method like Elements element = doc.getElementByID("root"); to get the data inside that div the rest of the data in that div will not appear. enter image description here. I specifically would just like to get the moneyline data for each game if anyone can help that would be great
public class ExtractSportsBookData {
public static void extractData(String url){
try{
Document doc = Jsoup.connect(url).get();
String html = doc.html();
System.out.println(html);
} catch (IOException e){
e.printStackTrace();
}
}
}
enter image description here
If you look at the image inside the li tags is where the data is stored for the moneylines for each game but I cannot seem to find a way to extract that data using Jsoup
public class Main {
public static void main(String[] args) {
ExtractSportsBookData.extractData("https://sportsbook.fanduel.com/");
}
}
import java.io.IOException;
import java.io.SyncFailedException;
public class ExtractSportsBookData {
public static void extractData(String url){
try{
Document doc = Jsoup.connect(url).get();
String html = doc.html();
//System.out.println(html);
Elements element = doc.getElementsByClass("jo jp fk fe jy jz bs");
System.out.println(element.isEmpty());
} catch (IOException e){
e.printStackTrace();
}
}
}
enter image description here
The result I receive from this is true meaning that the element is empty which is not what I want. Any help on this would be appreciated

Related

Concurrency for recursive webcrawler-algorithm in Java

I wrote a program in Java to find all pages of a website, starting with the URL of the startpage (using Jsoup as webcrawler). It is ok for small websites but too slow for sites with 200 or more pages:
public class SiteInspector {
private ObservableSet<String> allUrlsOfDomain; // all URLS found for site
private Set<String> toVisit; // pages that were found but not visited yet
private Set<String> visited; // URLS that were visited
private List<String> invalid; // broken URLs
public SiteInspector() {...}
public void getAllWebPagesOfSite(String entry) //entry must be startpage of a site
{
toVisit.add(entry);
allUrlsOfDomain.add(entry);
while(!toVisit.isEmpty())
{
String next = popElement(toVisit);
getAllLinksOfPage(next); //expensive
toVisit.remove(next);
}
}
public void getAllLinksOfPage(String pageURL) {
try {
if (urlIsValid(pageURL)) {
visited.add(pageURL);
Document document = Jsoup.connect(pageURL).get(); //connect to pageURL (expensive network operation)
Elements links = document.select("a"); //get all links from page
for(Element link : links)
{
String nextUrl = link.attr("abs:href"); // "http://..."
if(nextUrl.contains(new URL(pageURL).getHost())) //ignore URLs to external hosts
{
if(!isForbiddenForCrawlers(nextUrl)) // URLS forbidden by robots.txt
{
if(!visited.contains(nextUrl))
{
toVisit.add(nextUrl);
}
}
allUrlsOfDomain.add(nextUrl);
}
}
}
else
{
invalid.add(pageURL); //URL-validation fails
}
}
catch (IOException e) {
e.printStackTrace();
}
}
private boolean isForbiddenForCrawlers(String url){...}
private boolean urlIsValid(String url) {...}
public String popElement(Set<String> set) {...}
I know I have to run the expensive network-operation in extra threads.
Document document = Jsoup.connect(pageURL).get(); //connect to pageURL
My problem is that I have no idea how to properly outsource this operation while keeping the sets consistent (how to synchronize?). If possible I want to use a ThreadPoolExecutor to control the amount of threads that is getting started during the process. Do you guys have an idea how to solve this? Thanks in advance.

To use threads and also keep the sets consistent, you just need to create a thread that receives the variable you want to add to the Set but created empty, so the thread fills it when done and then adds it to the Set.
A simple example of that could be:
Main.class
for (String link : links) {
String validUrl = null;
taskThread = new Thread( new WebDownloadThreadHanlder(link, validUrl, barrier));
taskThread.start();
if (validUrl != null) {
allUrlsOfDomain.add(validUrl);
}
}
barrier.acquireUninterruptibly(links.size());
WebDownloadThreadHandler.class
public class WebDownloadThreadHandler implements Runnable {
private String link;
private String validUrl;
private Semaphore barrier;
public ScopusThreadHandler(String link, String validUrl, Semaphore barrier) {
this.link = link;
this.validUrl = null;
this.barrier = barrier;
}
public void run () {
try {
Document document = Jsoup.connect(this.link).userAgent("Mozilla/5.0");
Elements elements = document.select(YOUR CSS QUERY);
/*
YOUR JSOUP CODE GOES HERE, AND STORE THE VALID URL IN: this.validUrl = THE VALUE YOU GET;
*/
} catch (IOException) {
e.printStackTrace();
}
this.barrier.release();
}
}
What you are doing here is creating a thread for every web you want to get all the links from, and storing them into variables, if you want to retrieve more than one lvalid link from every page, you can do it using a Set and adding it a to a global set (appending it). The thing is that to keep your code consistent you need to store the retrieved values in the variable you pass the thread as argument using THIS keyword.
Hope it helps! If you need anything else feel free to ask me!

Is there any way to parse this data from HTML content with Jsoup?

I really need to parse this div.fan-details data (added link of a picture which shows exactly what I need). But I just don't know how to do it. I tried the following code, but it didn't return anything.
public class App {
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect("http://www.younow.com/name").get();
Elements info = doc.select("div.fan-details");
for (Element fansInfo: info) {
String str = fansInfo.toString();
System.out.println(str);
}
}
}
Sorry, I just started using jsoup. Any help is appreciated!
IMG:

JSoup Data Issue

So i am trying to get some data from a website by using JSoup, and i am not sure how.
This is the code i have been using and it does not work:
public static Document doc;
public static Elements elementPrice;
public void getDocument()
{
try
{
doc = Jsoup.connect("https://steamcommunity.com/market/search?appid=730&q=ak47+jaguar+factory-new").get();
elementPrice = doc.select("market_table_value");
System.out.println(elementPrice);
} catch (IOException e)
{
e.printStackTrace();
}
}
}
I am trying to get data from this site: https://steamcommunity.com/market/search?appid=730&q=ak47+jaguar+factory-new
And the data/attribute i am trying to get is this:
Pris från:
35,36€
Which is the price of a csgo item in steam.
And now i wonder why this doesen't work.
Thanks for any help! :)

select uses CSS selectors syntax so if you want to describe elements by its class use .className (notice dot at start). So try with
elementPrice = doc.select(".market_table_value");
// ^--add this dot
You can also use getElementsByClass method instead of select and pass name of class directly, without any CSS like
elementPrice = doc.getElementsByClass("market_table_value");

Parsing table elements with Jsoup

I'm trying to parse data from this table. Let's say, for example, that I want to parse the second elements from the second row (called SLO).
I can see there is a TR inside TR and the SLO word doesn't even have an ID or anything. How can I parse this?
This is the code:
class Title extends AsyncTask<Void, Void, Void> {
#Override
protected void onPreExecute() {
super.onPreExecute();
tw1.setText("Loading...");
}
#Override
protected Void doInBackground(Void... params) {
try {
Document doc = Jsoup.connect("https://www.easistent.com/urniki/cc45c5d0d303f954588402a186f5cdba5edb51d6/razredi/16515").get();
Elements eles = doc.select("");
title = eles.toString();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
#Override
protected void onPostExecute(Void result) {
super.onPostExecute(result);
tw1.setText(title);
}
}
I don't know what to put in the doc.select(""); because I've never parsed something like this. I've only parsed titles of webpages and such. Could someone help me with this?

There is plenty of information there for you to use, for example class names or title attributes. The URL you provided won't work for me, and I can't copy paste the HTML from your image so my example will show just the parsing of the span based on its title:
String html = "<span title='Slovenscina'>SLO</span>";
Document doc = Jsoup.parse(html);
Elements eles = doc.select("span[title=Slovenscina]");
String title = eles.text();
System.out.println(title);
Will output:
SLO
This will work in the scope of the other HTML that you provided. I suggest you read some more about the selector-syntax of Jsoup.

Image loaded from a PHP script isn't displayed in ImageView

I want to fill an ImageView with an image saved on my localhost. I can display it in a browser fine but it doesn't get displayed in my ImageView.
PHP script to display image:
<?php
include('connect_db.php');
$id = $_GET['id'];
$path = 'Profile_Images/'.$id.'.jpg';
echo '<img src='.$path.' border=0>';
?>
Here is my android code to download an image from a URL:
URL url;
try {
url = new URL("http://192.168.1.13/get_profile_image.php?id=145");
new DownloadImage(this).execute(url);
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
#Override
public void imageDownloaded(final Bitmap downloadedImage) {
runOnUiThread(new Runnable() {
public void run() {
ImageView imageView = (ImageView)findViewById(R.id.imageView);
imageView.setImageBitmap(downloadedImage);
}
});
}
When I put the URL to some other image it works fine but mine never gets loaded, any ideas?
Also, the following code works but I have a feeling its bad practice..
URL url;
try {
url = new URL("http://192.168.1.13/Profile_Images/145.jpg");
new DownloadImage(this).execute(url);
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Where 145 would be a variable.
Edit
Any reasons for the down-votes would be appreciated!

It's pretty simple, actually. When you send a request to http://192.168.1.13/get_profile_image.php?id=145 a string (<img src=Profile_Images/'.$id.'.jpg border=0>) is sent back. Because the DownloadImage class doesn't parse HTML (it wants raw image data) it doesn't know where the actual image is. Two solutions:
Your 'bad practice' approach
Use this PHP script instead to echo the raw image data:
PHP:
$id = $_GET['id'];
$path = 'Profile_Images/'.$id.'.jpg';
$type = 'image/jpeg';
header('Content-Type:'.$type);
header('Content-Length: ' . filesize($path));
readfile($path);
EDIT: forgot to credit someone: the above code is taken from here (with adapted variable names): https://stackoverflow.com/a/1851856/1087848

Your PHP code will print out something along the lines of:
<img src=Profile_Images/145.jpg border=0>
Not only is this malformed HTML, but it is a text output. Your other URL, http://192.168.1.13/Profile_Images/145.jpg points to an image file, from which the data your receive is an image, not an HTML string.
You should consider having your PHP return a JSON response with the URL of the image ID, and then running DownloadImage on that URL. The advantage this has over a raw echo is that you can easily expand the solution to return other types of files, and even return an array of files.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Jsoup webscraping to find game odds data - java

Related

Concurrency for recursive webcrawler-algorithm in Java

Is there any way to parse this data from HTML content with Jsoup?

JSoup Data Issue

Parsing table elements with Jsoup

Image loaded from a PHP script isn't displayed in ImageView

Categories

Resources