Getting an error using JSoup. Why? - java

I'm trying to login and extract data from a fantasyfootball website.
I get the following error,
Jul 24, 2015 8:01:12 PM StatsCollector main
SEVERE: null
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=http://fantasy.premierleague.com/
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:537)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:493)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:205)
at StatsCollector.main(StatsCollector.java:26)
whenever I try this code. Where am I going wrong?
public class StatsCollector {
public static void main (String [] args){
try {
String url = "http://fantasy.premierleague.com/";
Connection.Response response = Jsoup.connect(url).method(Connection.Method.GET).execute();
Response res= Jsoup
.connect(url)
.data("ismEmail", "example#googlemail.com", "id_password", "examplepassword")
.method(Method.POST)
.execute();
Map<String, String> loginCookies = res.cookies();
Document doc = Jsoup.connect("http://fantasy.premierleague.com/transfers")
.cookies(loginCookies)
.get();
String title = doc.title();
System.out.println(title);
}
catch (IOException ex) {
Logger.getLogger(StatsCollector.class.getName()).log(Level.SEVERE,null,ex);
}
}
}

Response res= Jsoup
.connect(url)
.data("ismEmail", "example#googlemail.com", "id_password", "examplepassword")
.method(Method.POST)
.execute();
Are you trying to execute this actual code? This seems to be an example code with placeholders instead of login credentials. This would explain the error you received, HTTP 403.
Edit 1
My bad. I took a look at the login form on that site, and it seems to me that you confused the id of the input elements ("ismEmail" and "id_password" with the name which gets sent with the form ("email", "password"). Is this working for you?
Response res= Jsoup
.connect(url)
.data("email", "example#googlemail.com", "password", "examplepassword")
.method(Method.POST)
.execute();
Edit 2
Okay, this was stuck in my head, beacause signing into a website with JSoup should not be that hard. I created an account there and tried for myself. Code first:
String url = "https://users.premierleague.com/PremierUser/j_spring_security_check";
Response res = Jsoup
.connect(url)
.followRedirects(false)
.timeout(2_000)
.data("j_username", "<USER>")
.data("j_password", "<PASSWORD>")
.method(Method.POST)
.execute();
Map<String, String> loginCookies = res.cookies();
Document doc = Jsoup.connect("http://fantasy.premierleague.com/squad-selection/")
.cookies(loginCookies)
.get();
So what is happening here? First I realized, that the target of the login form was wrong. The page seems to be built on spring, so the form attributes and target use spring defaults j_spring_security_check, j_username and j_password. Then a read timeout occurred to me, until I set the flag followRedirects(false). I can only guess why this helped, but maybe this is a protection against crawlers?
In the end i try to connect to the squad selection page, and the parsed response contains my personal view and data. This code seems to work for me, would you give it a try?

Related

Cant Get Discord UserID using Get with Jsoup

Im trying to use Jsoup to make http Get request Discord Api, to get users information like this, what am I doing wrong ?
thread {
val id = "MY_ID:"
val token = "MY_TOKEN_BOT"
val source = Jsoup.connect("https://discord.com/api/v9/users/${id}")
.header("Authorization","Bot $token")
.ignoreContentType(true)
.method(Connection.Method.GET)
.execute()
.body()
runOnUiThread {
binding.txv.text = JSONObject(source).toString()
}
}
Response code GET 403
org.jsoup.HttpStatusException: HTTP error fetching URL. Status=403, URL=[https://discord.com/api/v9/users/264097054047862794]
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:890)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:829)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:366)
at com.example.jsouptester.MainActivity$testing$1.invoke(MainActivity.kt:80)
at com.example.jsouptester.MainActivity$testing$1.invoke(MainActivity.kt:62)
at kotlin.concurrent.ThreadsKt$thread$thread$1.run(Thread.kt:30)

Getting bad response using JRAW

I am trying to read data from reddit using java. I am using JRAW.
Here is my code:
public class Main {
public static void main(String args[]) {
System.out.println('a');
String username = "dummyName";
UserAgent userAgent = new UserAgent("crawl", "com.example.crawl", "v0.1", username);
Credentials credentials = Credentials.script(username, <password>,<clientID>, <client-secret>);
NetworkAdapter adapter = new OkHttpNetworkAdapter(userAgent);
RedditClient reddit = OAuthHelper.automatic(adapter, credentials);
Account me = reddit.me().about();
System.out.println(me.getName());
SubmissionReference submission = reddit.submission("https://www.reddit.com/r/diabetes/comments/9rlkdm/shady_insurance_work_around_to_pay_for_my_dexcom/");
RootCommentNode rcn = submission.comments();
System.out.println(rcn.getDepth());
System.out.println();
// Submission submission1 = submission.inspect();
// System.out.println(submission1.getSelfText());
// System.out.println(submission1.getUrl());
// System.out.println(submission1.getTitle());
// System.out.println(submission1.getAuthor());
// System.out.println(submission1.getCreated());
System.out.println("-----------------------------------------------------------------");
}
}
I am making two requests as of now, first one is reddit.me().about(); and the second is reddit.submission("https://www.reddit.com/r/diabetes/comments/9rlkdm/ shady_insurance_work_around_to_pay_for_my_dexcom/");
The output is:
a
[1 ->] GET https://oauth.reddit.com/api/v1/me?raw_json=1
[<- 1] 200 application/json: '{"is_employee": false, "seen_layout_switch": true, "has_visited_new_profile": false, "pref_no_profanity": true, "has_external_account": false, "pref_geopopular": "GL(...)
dummyName
[2 ->] GET https://oauth.reddit.com/comments/https%3A%2F%2Fwww.reddit.com%2Fr%2Fdiabetes%2Fcomments%2F9rlkdm%2Fshady_insurance_work_around_to_pay_for_my_dexcom%2F?sort=confidence&sr_detail=false&(...)
[<- 2] 400 application/json: '{"message": "Bad Request", "error": 400}'
Exception in thread "main" net.dean.jraw.ApiException: API returned error: 400 (Bad Request), relevant parameters: []
at net.dean.jraw.models.internal.ObjectBasedApiExceptionStub.create(ObjectBasedApiExceptionStub.java:57)
at net.dean.jraw.models.internal.ObjectBasedApiExceptionStub.create(ObjectBasedApiExceptionStub.java:33)
at net.dean.jraw.RedditClient.request(RedditClient.kt:186)
at net.dean.jraw.RedditClient.request(RedditClient.kt:219)
at net.dean.jraw.RedditClient.request(RedditClient.kt:255)
at net.dean.jraw.references.SubmissionReference.comments(SubmissionReference.kt:50)
at net.dean.jraw.references.SubmissionReference.comments(SubmissionReference.kt:28)
at Main.main(Main.java:36)
Caused by: net.dean.jraw.http.NetworkException: HTTP request created unsuccessful response: GET https://oauth.reddit.com/comments/https%3A%2F%2Fwww.reddit.com%2Fr%2Fdiabetes%2Fcomments%2F9rlkdm%2Fshady_insurance_work_around_to_pay_for_my_dexcom%2F?sort=confidence&sr_detail=false&raw_json=1 -> 400
... 6 more
As it can been that my first request gives me a response of my username but in the second response i am getting a bad request 400 error.
To check whether my client ID and client secret were working correctly I did the same request using python PRAW library.
import praw
from praw.models import MoreComments
reddit = praw.Reddit(client_id=<same-as-in-java>, client_secret=<same-as-in-java>,
password=<same-as-in-java>, user_agent='crawl',
username="dummyName")
submission = reddit.submission(
url='https://www.reddit.com/r/redditdev/comments/1x70wl/how_to_get_all_replies_to_a_comment/')
print(submission.selftext)
print(submission.url)
print(submission.title)
print(submission.author)
print(submission.created_utc)
print('-----------------------------------------------------------------')
This gives the desired result without any errors so the client secret details must be working.
The only doubt I have is in the user agent creation in java UserAgent userAgent = new UserAgent("crawl", "com.example.crawl", "v0.1", username);.
I followed the following link.
What exactly does the target platform, the unique ID or the version mean. I tried to keep the same format as in the link. Also using the same username as in other places. On the other hand the user_agent in python was a string crawl.
Please tell me if I am missing anything and what could be the issue.
Thank you
P.S. I want to do this in java. not python.
Since your first query is working the credentials are correct. In JRAW don't give the whole URL but only the id in the submission function.
Change this
SubmissionReference submission = reddit.submission("https://www.reddit.com/r/diabetes/comments/9rlkdm/shady_insurance_work_around_to_pay_for_my_dexcom/");
to this
SubmissionReference submission = reddit.submission("9rlkdm");
where the id is the random string after /comment/ in the URL.
Hope this helps.

Cannot maintain a PHP session in JSoup

I'm trying to retrieve information from a catalog using JSoup, it always has 9 columns per row, the 6th column specifically is a placeholder for when you are logged in, when you actually are logged in to the site, that column shows "price".
I have the following: (username and password not shown here)
Document doc = null;
String url;
Response res = Jsoup.connect("https://www.prisa.cl/home/?page=iniciaSesion")
.method(Method.GET)
.timeout(10000)
.execute();
String sessionID = res.cookie("PHPSESSID");
System.out.println(sessionID);
res = Jsoup.connect("https://www.prisa.cl/home/?page=iniciaSesion")
.data("email_address", username, "password", password)
.method(Method.POST)
.timeout(10000)
.execute();
sessionID = res.cookie("PHPSESSID");
System.out.println(sessionID);
for(int page=1; page<=1; page++){
url = "https://www.prisa.cl/catalog/advanced_search_result.php"
+ "?keywords=%20&enviar=&categories_id=&manufacturers_id=&pfrom=&pto=&sort=2a&&page="+page;
doc = Jsoup.connect(url)
.cookie("PHPSESSID", sessionID)
.timeout(10000)
.get();
for(Element table : doc.select("table table table table table")){
for(Element row : table.select("tr")){
Elements tds = row.select("td");
if(tds.size() == 9){
System.out.println(tds.select("img[src]").attr("src")+";"+
tds.get(1).text()+";"+
tds.get(2).text()+";"+
tds.get(3).text()+";"+
tds.get(4).text()+";"+
tds.get(5).text()+";"+
tds.get(6).text());
} //end if
} //rows
} //tables
System.out.println("finished page: "+page);
} //pages
what i think/hope is happening here is:
1- I'm getting the PHPSESSID cookie while not logged in (for debugging purposes)
2- I'm getting the PHPSESSID again while logged in (has different data)
3- I'm iterating for each page in the catalog (used only 1 in the code above) and attempting to send the PHPSESSID cookie during the connection to retrieve the data while logged in
4- Looking for a TR that has 9 TDs while being 5 tables deep (the page layout is a little confusing)
I am super new to this but I actually searched for a couple of days some different methods in Stack Overflow and in the JSoup documentation to no avail.
What am i doing wrong?

Cannot login to website by using JSOUP with x-www-form-urlencoded parameters

How can I implement the following request by using Jsoup?
POST /login/user HTTP/1.1
Host: url.publishedprices.co.il
Cache-Control: no-cache
Content-Type: application/x-www-form-urlencoded
username=readonly&password=123456&csrftoken=wohewqfDrcK2JMK5w7BKw4jCuMOiARnDg01Rw4VZdQ%3D%3D
I've tried the following code but it doesn't work, I get an error from a site that
Did not receive expected security token
I'm using this code:
Document welcomePage = Jsoup.connect("https://url.publishedprices.co.il/login").get();
Element inputHidden = welcomePage.getElementById("csrftoken");
String securityTokenKey = inputHidden.attr("name");
String securityTokenValue = inputHidden.attr("value");
Connection.Response res2 = Jsoup.connect("https://url.publishedprices.co.il/login/user")
.header("Content-Type","application/x-www-form-urlencoded;charset=UTF-8")
.data("username", "readonly")
.data("password", "123456")
.data(securityTokenKey, securityTokenValue)
.method(Method.POST)
.execute();
System.out.println(res2.body());
Map<String, String> loginCookies = res2.cookies();
I know that when I use x-www-form-urlencoded I need to encode it within URL but supposed that when I set correct header JSOUP do it for me, am I wrong?
Thank you.
You should pass the cookie (which contains the session with the secret token), so that the CSRF protection on server side will be able to compare the tokens and grant you access.
Connection.Response res1 = Jsoup.connect("https://url.publishedprices.co.il/login").method(Method.GET).execute();
Document welcomePage = res1.parse();
Map welcomCookies = res1.cookies();
Element inputHidden = welcomePage.getElementById("csrftoken");
String securityTokenKey = inputHidden.attr("name");
String securityTokenValue = inputHidden.attr("value");
Connection.Response res2 = Jsoup.connect("https://url.publishedprices.co.il/login/user")
.header("Content-Type","application/x-www-form-urlencoded;charset=UTF-8")
.data("username", "readonly")
.data("password", "123456")
.data(securityTokenKey, securityTokenValue)
.cookies(welcomCookies)
.method(Method.POST)
.execute();
System.out.println(res2.body());

Jsoup: error 307 when trying to access a page

I'm trying to access the page http://www.betbrain.com with jsoup, but this give me error 307. Anyone knows how I can fix this?
String sURL = "http://www.betbrain.com";
Connection.Response res = Jsoup.connect(sURL).timeout(5000).ignoreHttpErrors(true).followRedirects(true).execute();
HTTP status code 307 is not an error, it's an information saying that the server is making a temporary redirect to another page.
See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html for info about HTTP Status codes.
The response returned from your request holds the value for the redirect inside the headers.
To get the header-values you could something like this:
String[] headerValues = res.headers().values().toArray(new String[res.headers().values().size()]);
String[] headerKeys = res.headers().keySet().toArray(new String[res.headers().keySet().size()]);
for (int i = 0; i < headerValues.length; i++) {
System.out.println("Key: " + headerKeys[i] + " - value: " + headerValues[i]);
}
You need you own code of course for this, as you need your response.
Now when you look at the headers written to the console you will see a key:
Location which has a value of http://www.betbrain.com/?attempt=1.
This is your URL to redirect to, so you would do something like:
String newRedirectedUrl = res.headers("location");
Connection.Response newResponse = Jsoup.connect(newRedirectUrl).execute();
// Parse the response accordingly.
I am not sure why jsoup isn't following this redirect correctly, but it seems like it could have something to do with the standard Java implementation of HTTP redirects.

Categories

Resources