I have a swing application that read HTML pages using the following command
String urlzip = null;
try {
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
for (Element link : links) {
if (link.attr("abs:href").contains("BcfiHtm.zip")) {
urlzip = link.attr("abs:href").toString();
}
}
} catch (IOException e) {
textAreaStatus.append("Failed to get new file from internet:"+e.getMessage()+"\n");
e.printStackTrace();
}
return urlzip;
then my swing application will return a string, It works fine and it reads any HTML page that I give to it. However, some times the application gave me the following error type Exception report. How can i increase timeOut?
There's an example on this page.
Jsoup.connect("http://example.com").timeout(3000)
This error occurs while you are trying to read data and because of large data or connection problem it can not complete the task. I would suggest you to increase your Timeout using above code atleast for 1 minute. so it will be like below code,
Jsoup.connect("http://example.com").timeout(60000);
Related
When running the following code:
try {
Document doc = Jsoup.connect("https://pomofocus.io/").get();
Elements text = doc.select("div.sc-kEYyzF");
System.out.println(text.text());
}
catch (IOException e) {
e.printStackTrace();
}
No output occurs. When changing the println to:
System.out.println(text.first().text());
I get a NullPointerException but nothing else.
jsoup doesn't execute javascript - it parses the HTML that the server returns. You can check View Source (vs Inspect) to see the response from the server, and what is selectable.
So I've been trying to build a webscraper but some of the data I need to scrape is locked behind a reCaptcha. From what I've gathered scouring around on the internet is every captcha has a TextArea element with the 'g-recaptcha-response' that gets filled in as the captcha is completed. The current solution for testing is to simply get around the captcha with me manually doing it and trying to capture the response and feed it back into the headless browser however I'm unable to get the response since as soon as the answer is submitted it can no longer find the response element.
org.openqa.selenium.NoSuchElementException: no such element: Unable to locate element: {"method":"css selector","selector":"*[name='g-recaptcha-response']"}
public static String captchaSolver(String captchaUrl) {
setUp();
driver.get(captchaUrl);
new WebDriverWait(driver,2);
try {
while (true) {
String response = driver.findElement(By.name("g-recaptcha-response")).getText();
if (response.length()!=0) {
System.out.println(response);
break;
}
}
}catch (Exception e){
e.printStackTrace();
}
return "";
}
Try to find the element by CSS like this:
*[name*='g-recaptcha-response']
I'm writing a small program and I want to fetch an element from a website. I've followed many tutorials to learn how to write this code with jSoup. An example of what I'm trying to print is "Monday, November 19, 2018 - 3:00pm to 7:00pm". I'm running into the error
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=https://my.cs.ubc.ca/course/cpsc-210
Here is my code:
public class WebPageReader {
private String url = "https://my.cs.ubc.ca/course/cpsc-210";
private Document doc;
public void readPage(){
try {
doc = Jsoup.connect(url).
userAgent("Mozilla/5.0")
.referrer("https://www.google.com").timeout(1000).followRedirects(true).get();
Elements temp=doc.select("span.date-display-single");
int i=0;
for (Element officeHours:temp){
i++;
System.out.println(officeHours);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Thanks for the help.
Status 403 means your access is forbidden.
Please make sure you have an access to https://my.cs.ubc.ca/course/cpsc-210
I have tried to access https://my.cs.ubc.ca/course/cpsc-210 from browser. It returns Error Page. I think you need to use credential to access it.
I am working on a small app for myself and I just don't understand why my code is working in Eclipse but not on my phone using Android Studio.
public static ArrayList<Link> getLinksToChoose(String searchUrl) {
ArrayList<Link> linkList = new ArrayList<Link>();
try {
System.out.println(searchUrl);
Document doc = Jsoup.connect(searchUrl).timeout(3000).userAgent("Chrome").get();
Elements links = doc.select("tr");
links.remove(0);
Elements newLinks = new Elements();
for(Element link : links) {
Link newLink = new Link(getURL(link),getName(link),getLang(link));
linkList.add(newLink);
}
} catch(IOException e){
e.printStackTrace();
}
return linkList;
}
The problem is I can't even get the Document. I always get an httpurlconnectionimpl in the line where I try to get the html doc. I have read a bit about Jsoup in Android. Some people suggest using AsyncTask but it doesn't seem like that would solve my problem.
The loading of the content must happen outside the main thread, e.g. in an AsyncTask.
I have a html page that I am reading.
If the format I am reading in that page is not present I want to exit and continue with the next page but that is not working.
can you please let me know what I am missing
try
{
Document doc = Jsoup.connect(urlget).get();
Element tables = doc.select("div.itembody");
websiteaddress= tables.text();
}
catch (IOException ee)
{
}
If the get is not having itembody I am seeing a exception:
Exception in thread "main" java.lang.NullPointerException
I want this loop to be continued not the program exsit when there is a exception
doc.select returns an object of type Elements (a list of Elements) not Element. If no element in your html matches the query you get an empty list of elements. Change your code to:
try
{
Document doc = Jsoup.connect(urlget).get();
Elements tables = doc.select("div.itembody");
if(tables.isEmpty())
noDivItembodyInHTML();
else
websiteaddress = tables.first().text();
}
catch (IOException ee)
{
}