I have a website.
Can see inside the contents must be logged in.
However, I use this code to log on.
doc = Jsoup.connect("http://46.137.207.181/Account/Login.aspx")
.data("ctl00$MainContent$LoginUser$UserName", "1234")
.data("ctl00$MainContent$LoginUser$Password", "123456")
.data("__VIEWSTATE","/wEPDwULLTEyMDAyNTY1NjJkGAEFHl9fQ29udHJvbHNSZXF1aXJlUG9zdEJhY2tLZXlfXxYBBSZjdGwwMCRNYWluQ29udGVudCRMb2dpblVzZXIkUmVtZW1iZXJNZUHk9FMvtsvPHqlP3vAV+1oloaxe4Asr7RQX5XFptqGz")
.data("__EVENTVALIDATION","/wEWBQLup8mjCgLFyvjkDwLQzbOWAgKVu47QDwKnwKnjBTL6Xsxc9zQnY8p9KVlFJ/8HIHqlOGl9uClF4ktcWYJ5")
.data("ctl00$MainContent$LoginUser$LoginButton","2")
request.
.post();
Then get the login pages.
doc2 = Jsoup.connect("http://46.137.207.181/Groups.aspx").get();
s=doc.title();
Elements kelime = doc.select("td");
for (Element link : kelime) {
linkHref = link.attr("hh");
Have shown no login screen.
I would like to ask how can I do it?
What is happening in your example is that you are logging in with form data to Login.apsx and creating a session, but the request to Groups.aspx doesn't carry that session data, so the request is not authenticated.
Login.aspx will return a session cookie, and you need to pass that cookie onto the next request.
See the answers to this jsoup login question for good examples.
Related
Connection.Response loginPage = Jsoup.connect("https://accounts.google.com/ServiceLogin?elo=1")
.method(Connection.Method.GET)
.execute();
Document loginDocument = loginPage.parse();
Element form = loginDocument.getElementById("gaia_loginform");
Connection connection1 = Jsoup.connect("https://accounts.google.com/signin/challenge/sl/password")
.cookies(loginPage.cookies())
.method(Method.POST);
Elements inputElements = form.getElementsByTag("input");
for (Element inputElement : inputElements) {
String key = inputElement.attr("name");
String value = inputElement.attr("value");
if (value != null && key != null && !key.equals("")) {
connection1.data(key, value);
}
}
connection1.data("Email", "myemeailv#gmail.com");
connection1.data("Passwd", "mypassword");
// trying to load gmail
Response response = connection1.execute();
Connection.Response main = Jsoup.connect("https://mail.google.com/mail/u/0/?tab=wm#inbox")
.method(Connection.Method.GET)
.cookies(response.cookies())
.execute();
System.out.println(main.body());
In code above I'm trying to submit gaia_loginform form which can be found on Google login page programatically. On first step I load login page using GET method. On the second step I'm creation connection using loaded data from gaia_loginform form and submit form via POST.
As result I expect to see some error messages but only login page is returned without any errors. I know there could be some kind of API for Gmail manipulation, but now I'm just trying to login so far.
i know very little about selenium, so i dont talk about it. After enter email address and press next, google account will send some extra params to browser via ajax (for example:bgresponse), and they will be added to post params,not just previous params.
The reason you receive login page is because you send different request without proper cookies, and google redirect to login page.
Connection.Response main = Jsoup.connect("https://mail.google.com/mail/u/0/?tab=wm#inbox")
.method(Connection.Method.GET)
.cookies(response.cookies())
.execute();
I'm creating an Android application which uses JSOUP to log into a website. In order to login I'm simply using the following:
loginDoc = Jsoup.connect(loginURL).get();
So this connects to the login URL which contains the users details. What I want to do is find out the session id (cookie data) for this session. How do I do this? As you can see i'm using a .get request and all of the examples I've seen on stackoverflow and elsewhere are using .post requests. Does anyone have any ideas?
Thanks,
The .get() method returns a Document, but if you do an .execute() instead, you get a Response object with the cookies, headers, et al.
For example:
Connection.Response res = Jsoup.connect(loginUrl).execute();
String sessionId = res.cookie("sessionId");
Document doc = res.parse();
I am trying to crawl a web-page which requires authentication. I am able to access that page in browser when I am logged in, using JSoup http://jsoup.org/ library to parse HTML pages.
public static void main(String[] args) throws IOException {
// need http protocol
Document doc = Jsoup.connect("http://www.secinfo.com/$/SEC/Filing.asp?T=r643.91Dx_2nx").get();
// get page title
String title = doc.title();
System.out.println("title : " + title);
// get all links
Elements links = doc.select("a");
for (Element link : links) {
// get the value from href attribute
System.out.println("\nlink : " + link.attr("href"));
}
System.out.println();
}
Output :
title : SEC Info - Sign In
This is getting the content of the sign in page not the actual URL i am passing. I am registered on secinfo.com and while running this program I am logged in from my default browser Firefox.
This will not help even if you are logged in using your default browser. Your java program is a separate process and it doesn't share the screen with your browsers.
On the other hand secinfo needs an authentication and JSoup allows you to pass authentication details.
It works for me when I pass the authentication details:
Please check this answer (Jsoup connection with basic access authentication)
Jsoup's connect() also support a post() with method chaining, if your target site's login mechanism work with POST request:
Document doc = Jsoup.connect("url")
.data("aUserName", "myUserName")
.data("aPassword", "myPassword")
.userAgent("Mozilla")
.timeout(3000)
.post();
But what if the page you are trying to get requires subsequent cookie sending for each request ? Try to use HttpURLConnection with POST and read the cookie from HTTP connection response header. HttpClient will make this task easier for you. Use the library to fetch a web page as string and then pass the string to jsoup.parse() function to get the document.
You have to sign in with a post command and preserve the cookies you get back. That is where you session info is stored. I wrote an example here: Jsoup can't Login on Page.
The website in the example is an exception it sets the session cookie already on the login page. You can leave that step if it is work for you.
The exact post command can be different from website to website. You have to dig it out from the html or you have to install a plugin to your browser and intercept the post commands.
For example in gmail login, when we consider a login test, when doing it manually for the first time we'll get the login page, from next time onwards we'll be directly getting into the inbox page.
If you try to do the same thing in webdriver(Run login test twice), in all these attempts we'll get the login page as we didn't login from this machine earlier. What is happening in behind the scenes in maintaining the session with respect to cookies or session ?
Here is the description & code snippet from selenium docs to add or remove cookies:
Before we leave these next steps, you may be interested in
understanding how to use cookies. First of all, you need to be on the
domain that the cookie will be valid for. If you are trying to preset
cookies before you start interacting with a site and your homepage is
large / takes a while to load an alternative is to find a smaller page
on the site, typically the 404 page is small
(http://example.com/some404page)
// Go to the correct domain
driver.get("http://www.example.com");
// Now set the cookie. This one's valid for the entire domain
Cookie cookie = new Cookie("key", "value");
driver.manage().addCookie(cookie);
// And now output all the available cookies for the current URL
Set<Cookie> allCookies = driver.manage().getCookies();
for (Cookie loadedCookie : allCookies) {
System.out.println(String.format("%s -> %s", loadedCookie.getName(), loadedCookie.getValue()));
}
// You can delete cookies in 3 ways
// By name
driver.manage().deleteCookieNamed("CookieName");
// By Cookie
driver.manage().deleteCookie(loadedCookie);
// Or all of them
driver.manage().deleteAllCookies();
Right now, we have csrf token per session. And adding this token jsp's using hidden field. following snippet gives only one per session:
token = (String) session.getAttribute(CSRF_TOKEN_FOR_SESSION_NAME);
if (null==token) {
token = UUID.randomUUID().toString();
session.setAttribute(CSRF_TOKEN_FOR_SESSION_NAME, token);
}
and for every request,
//calls the above snippet and this time token will not be null
String st = CSRFTokenManager.getTokenForSession(request.getSession());
String rt = CSRFTokenManager.getTokenFromRequest(request);
here, usings equals to compare the strings and returning either true or false.
my question is, what happens if I try to generate the token for every request without getting the token from session. And while comparing, I will get from the session and request. is this good idea or missing something?
Instead of using the above snippets, I will go with following
//for every request generate a new and set in session
token = UUID.randomUUID().toString();
session.setAttribute(CSRF_TOKEN_FOR_SESSION_NAME, token);
//get the token from session and request and compare
String st = (String) request.getSession().getAttribute("CSRF_TOKEN_FOR_SESSION_NAME");
String rt = CSRFTokenManager.getTokenFromRequest(request);
You'll want to flip around the flow that you stated above. After every compare you should create a new token.
One large drawback to token-per-request is if the user hits the back button in their browser:
User visits Page1 and stores TokenA in session.
User clicks a link to Page2, submitting TokenA. The app verifies TokenA in session and gives the user TokenB.
User hits the back button to go back to Page1, session information is not updated.
Page1 still only has information for TokenA, user clicks a link or submits a form to Page3 submitting TokenA, but the session only knows about TokenB
App considers this a CSRF attack
Because of this, you need to take great care of how and when the tokens are updated.
Apart from the solution suggested by Jay, I will suggest you to avoid caching of your web-pages by setting various cache-control headers in the response to client.