html src hidden - java

Trying to read a webpage using HttpClient. But some of the html is hidden by some js magic, try hitting view source on this page http://uc.worldoftanks.eu/uc/accounts/#wot&at_search=a
Any idea how to get HttpClient to return the "full" html page?

HttpClient does not process javascript, which means there is no content that can be hidden when reading the http content from the server.
It's probably the other way round, the javascript that runs on the page likely creates new html elements and appends them to the DOM... which is not something you can handle using HttpClient, HttpClient is a communication client designed purely to read data accross a HTTP connection.

When that page loads, a request is being sent to
http://uc.worldoftanks.eu/uc/accounts/?type=table&offset=0&limit=25&order_by=name&search=a&echo=1&id=accounts_index
Try hitting that address up with your HttpClient to see the table data. Play with the offset, limit and order_by values to change pagination and sorting.
Manually browsing to said URL yields a redirect, though, so there appears to be some of the Request headers that you need to include in your HttpClient. The full headers of the request my browser issues, that does yield a JSON response with the table data, is as follows:
GET /uc/accounts/?type=table&offset=0&limit=25&order_by=name&search=&echo=1&id=accounts_index HTTP/1.1
Host: uc.worldoftanks.eu
Connection: keep-alive
Referer: http://uc.worldoftanks.eu/uc/accounts/?type=table&offset=0&limit=25&order_by=name&search=a&echo=1&id=accounts_index
X-Requested-With: XMLHttpRequest
X-CSRFToken: 5e33bf57602f76de9285e9b14bcfe7fe
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/13.0.782.107 Safari/535.1
Accept: application/json, text/javascript, */*; q=0.01
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-GB,en;q=0.8,en-US;q=0.6,ar;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: csw_popup=true; __utma=21812543.1316104722.1312873581.1312873581.1312873581.1; __utmb=21812543.2.10.1312873581; __utmc=21812543; __utmz=21812543.1312873581.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); csrftoken=5e33bf57602f76de9285e9b14bcfe7fe
They might be looking for X-Requested-With or Accept or Referrer, for instance.

Related

How To Scrap Network Type 'XHR' / 'Fetch' Data In Selenium 4?

My goal
Im trying to scrap raw video stream data (.ts files) from twitch.tv using Selenium 4.
All live streams are fed in chunks of video,
I can access them manually by:
opening a chrome tab with a running twitch.tv livestream
open DevTools (F12)
go to Network tab > XHR
The stream of .ts (transport stream) files being fetched are my desired files.
I can just doubleclick on them and chrome downloads this small video chunk file.
I want to reproduce this using Selenium 4 but I have no experience with Web Programming (POST, Flow etc). My current programm
is able to scrap image files. But once the response received is of .ts file (XHR/Fetch) it returns.
DevToolsException: {"id":11,"error":{"code":-32000,"message":"No data found for resource with given identifier"},"sessionId":"79BA2C212FABA878DB3524D7D0F49BDC"}
I have tried
Calling Network.getResponseBody when the Network.loadingFinished event has fired but this also doesn't work. There is never the same requestID on either event.
Remarks: Im aware there is a Twitch API.
public static void main(String[] args) {
InitializeSeleniumDrivers();
driver.get("https://www.twitch.tv/thebausffs");
DevTools devTools = ((ChromeDriver) driver).getDevTools();
devTools.createSession();
devTools.send(Network.clearBrowserCache());
devTools.send(Network.setCacheDisabled(true));
devTools.send(Network.enable(Optional.empty(), Optional.empty(), Optional.of(100000000)));
devTools.addListener(Network.responseReceived(), responseReceived -> {
RequestId requestId = responseReceived.getRequestId();
try {
Command<Network.GetResponseBodyResponse> getBody = Network.getResponseBody(requestId);
Network.GetResponseBodyResponse response = devTools.send(getBody);
} catch (DevToolsException e) {
e.printStackTrace();
}
});
}
Headers Example
GENERAL
Request URL: https://video-edge-c55dd0.ams02.abs.hls.ttvnw.net/v1/segment/CrEFZRTkEBMVDg5w4Ygn2pwqXKLGK5NAUAQ7ZWHeCORCjjFxfh9McgTBm_DTCvfP1MrZIg1jb2-oo2769tLAjFKjUd4AQaKtV3LeTEpPJyB_7ZAgolK-dSlLAqnC1xaI7z6iJCC4W1fb5RkkJmLk2D5nYEpyA17gSqe1eoB5zYsrDnal6Sm__B5LhxzOwTPOKI66jxXeIThm8tpaFGabccyd8AcT7RIfqCRv9Jas-IMQCqnBLLpIjk5rC-n4USQzLI6R4xGeTyTwMgX3BQ7EcxB-X62kUvsJm2O7Q2iJEI-ongDyyFRCapzo8iBtGgN2ruxvp8SeCKHO8j9NbS4jymG276ZigtnDXEQbxa6f5i9dHEcf9g1ump4RZtd48eOv6bPsGCDhFfULRd8adcM369ew90NrzyYbImQZnhFcnyqvfYIlCg-FFyjqJHVz37MZGc7TLbSh1YqmrkAClamXb8fFPGCXpsIrY-IDmKgTxh8tEmjbdacBWsKxxwJAOv-H6MUZB67MP1KMeT94YMjGXBcIjJo4JKeFCKoITCLJI4jjzqNmFa_efdlaJ89mUodxQRHJARV3qwdp04TSvZALBbOua6m-0T-01lOEYlr6w408mr5araj7c7gjpvrj_83jb0wqJG7ala1DBUg0U0Vx2rQxzumokyz66MxfMJy3ZSY92L-JdS47RjcOpilnpTI9bI8RPRyY4grds2SHDudWxgp-jJWgHdtbbFpuDCZENwOuU_-Agsf0lA_g59KnXnAuz59yovCO2C_O8ptkyoImgZ47qBPBIn-DDD-rzJloGD-GTQn4zGlmAFcg6GunjeW3PbHjKjMz8vA_K8NOF7ofO94YOtj_1khbCFGfH2_dF8zDwMSieR5Mvg7upQdzwgl_GAmf7OIAbHXwA1DqamnbAeWundcaDEM8dWDJF-pfTicm0CABKglldS13ZXN0LTIwtwQ.ts
Request Method: GET
Status Code: 200 OK
Remote Address: 185.42.204.31:443
Referrer Policy: strict-origin-when-cross-origin
RESPONSE HEADER
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Cache-Control: no-cache, no-store, private
Content-Length: 1589164
Content-Type: application/octet-stream
Date: Sun, 14 Aug 2022 16:56:31 GMT
REQUEST HEADER
Provisional headers are shown
Learn more
Referer
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.81 Safari/537.36

HttpOnly cookies not sent by request

I want to use HttpOnly cookies and I set it in Java as follows:
...
Cookie accessTokenCookie = new Cookie("token", userToken);
accessTokenCookie.setHttpOnly(true);
accessTokenCookie.setSecure(true);
accessTokenCookie.setPath("/");
response.addCookie(accessTokenCookie);
Cookie refreshTokenCookie = new Cookie("refreshToken", refreshToken);
refreshTokenCookie.setHttpOnly(true);
refreshTokenCookie.setSecure(true);
refreshTokenCookie.setPath("/");
response.addCookie(refreshTokenCookie);
...
I got the client side the response with the cookies, but when I send the next request I do not have the cookies on the request. Maybe I miss something, but as I understood, these HttpOnly cookies has to be sent by the browser back on every request (JavaScript does not have access to those cookies) coming to the defined path.
I have the following Request Headers:
Accept:application/json, text/plain, */*
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.8,hu;q=0.6,ro;q=0.4,fr;q=0.2,de;q=0.2
Authorization:Basic dXNlcm5hbWU6cGFzc3dvcmQ=
Connection:keep-alive
Content-Length:35
content-type:text/plain
Host:localhost:8080
Origin:http://localhost:4200
Referer:http://localhost:4200/
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36
X-Requested-With:XMLHttpRequest
and the following response headers:
Access-Control-Allow-Credentials:true
Access-Control-Allow-Origin:http://localhost:4200
Access-Control-Expose-Headers:Access-Control-Allow-Origin, Content-Type, Date, Link, Server, X-Application-Context, X-Total-Count
Cache-Control:no-cache, no-store, max-age=0, must-revalidate
Content-Length:482
Content-Type:application/json;charset=ISO-8859-1
Date:Fri, 03 Feb 2017 13:11:29 GMT
Expires:0
Pragma:no-cache
Set-Cookie:token=eyJhbGciO;Max-Age=10000;path=/;Secure;HttpOnly
Set-Cookie:refreshToken=eyJhb8w;Max-Age=10000;path=/;Secure;HttpOnly
Vary:Origin
Also in the client side I use withCredentials: true in Angular2 and X-Requested-With:XMLHttpRequest as request header.
And it is Cross Domain.
Yes you are correct having the cookie your browser should send the cookie automatically while it is not expired and the httpOnly flag means it cannot be accessed or manipulated via JavaScript.
However
You need to ensure that the cookie you are sending is not cross domain, if you require it cross domain you will need to handle it differently.

Jsoup authentication failed

I'm trying to connect this website : https://ent.enteduc.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=1 with the following code :
Connection.Response response = Jsoup.connect("https://ent.enteduc.fr/CookieAuth.dll?GetLogon?curl=Z2F&flags=0&forcedownlevel=0&formdir=1&username=XXX&password=XXX&trusted=4&SubmitCreds.x=36&SubmitCreds.y=7&SubmitCreds=Ouvrir+une+session")
.method(Connection.Method.GET)
.execute();
Document Doc = Jsoup.connect("https://ent.enteduc.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=1")
.data("username","myusername")
.data("password","mypassword")
.data("curl","Z2F")
.data("flags","0")
.data("forcedownlevel","0")
.data("formdir","1")
.data("trusted","4")
.data("SubmitCreds.x","40") //Seems to send the coordinates of the cursor
.data("SubmitCreds.y","12") //Seems to send the coordinates of the cursor
.data("SubmitCreds","Ouvrir une session")
.cookies(response.cookies())
.post();
Log.e("Body", Doc.body().toString());
But The displayed "Body" is still the authentication page (No error in the Logcat)
What's wrong ?
Here are the details of the connection, get with the Chromes's Console
Remote Address:85.90.60.205:443
Request URL:https://ent.enteduc.fr/CookieAuth.dll?Logon
Request Method:POST
Status Code:302 Moved Temporarily
Request Headersview source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:165
Content-Type:application/x-www-form-urlencoded
Cookie:ISAWPLB{FE9B5C07-18E7-4D86-BC7C-2F0AFE4F36BF}={8A3F320B-C8EB-40F9-A11E-D036A91F953F}; __utma=136247269.742318163.1408441429.1408445338.1408450626.3; __utmb=136247269.5.10.1408450626; __utmc=136247269; __utmz=136247269.1408441429.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); WSS_KeepSessionAuthenticated=; logondata=acc=0&lgn=*********
Host:ent.enteduc.fr
Origin:https://ent.enteduc.fr
Referer:https://ent.enteduc.fr/CookieAuth.dll?GetLogon?curl=Z2F&reason=0&formdir=1
User-Agent:Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36
Query String Parametersview sourceview URL encoded
Logon:
Form Dataview sourceview URL encoded
curl:Z2F
flags:0
forcedownlevel:0
formdir:1
username:myusername
password:mypass
trusted:4
SubmitCreds.x:53
SubmitCreds.y:12
SubmitCreds:Ouvrir une session
Response Headersview source
Connection:close
Content-Length:0
Location:https://ent.enteduc.fr/
Set-Cookie:cadata6A45CD714D774496A399F96AC521E21E....
There is nothing wrong with your code. It works. I tried the default user name and password you supplied. This is what the site does...
You login successfully and it sends a HTTP 302 to the path / and also gives you a cookie that identifies you.
HTTP/1.1 302 Moved Temporarily
Location: https://ent.enteduc.fr/
Set-Cookie: XXX
The browser requests for / and the server responds with another HTTP 302
HTTP/1.1 302 Found
Connection: Keep-Alive
Location: /etabs/0680001F/Pages/Accueil.aspx
Requesting for /etabs/0680001F/Pages/Accueil.aspx results in a 200 OK with HTML content written in french. Excusez moi ! Je ne parle pas francais.
Change your code to follow the redirects and set the cookies on each step and you should be fine.
[EDIT]
When you're done please remove the authentication info you supplied on this post.

login to http website java

I am trying to login to a http website for the first time and I am having a hard time understanding the proper format for sending arguments. I have looked at other examples and they don't seem to work for me so I thought I would see if someone can explain this to me. At this point my code seems to do absolutely nothing but here it is...
HttpURLConnection url= (HttpURLConnection)new URL("http://www.myameego.com/index2.php?do=login").openConnection();
url.setDoOutput(true);
url.setRequestMethod("POST");
OutputStreamWriter writer = new OutputStreamWriter(url.getOutputStream());
writer.write("X-Mapping-fjhppofk=6A991610BA398B3A39F4B491D5382BB4;
PHPSESSID=kbo25e08t3qvu08l1shkq8kk94; userName=coled; pass=ed45d626b07112a8a501d9672f3b92796a6754b8d8d9cb4c617fec9774889220; clientID=129; X-Mapping-fjhppofk=DCE62FE972E1EF2F12D0060EC74C3681; PHPSESSID=ukeo21oldb5pqsntu7kl8j3b96");
writer.flush();
I downloaded an http sniffer thinking that I could read what the browser was sending. that is how I got the write() line, it is the cookie that was sent by explorer. I also viewed the source code for the login screen and found a block of code near the bottom that looks like its responsible for login.
http://www.myameego.com/index2.php?do=login
Can someone tell me how I would go about hooking into this interface I don't understand how this works. if it helps this is the full packet from my manual login through the browser. I got it from my http sniffer.
Host Name: www.myameego.com
Method: POST
Path: /index2.php?do=login
User Agent: Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; NP06)
Response Code: 302
Response String: found
Content Type: text/html; charset=UTF-8
Referer: http://www.myameego.com/index.php?do=login
Transfer Encoding: chunked
Server: Apache
Content Length: 17817
Connection: Keep-Alive
Cache Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Location: /Ameego/index.php
Cookie: X-Mapping-fjhppofk=6A991610BA398B3A39F4B491D5382BB4; PHPSESSID=kbo25e08t3qvu08l1shkq8kk94; userName=coled; pass=ed45d626b07112a8a501d9672f3b92796a6754b8d8d9cb4c617fec9774889220; clientID=129; X-Mapping-fjhppofk=DCE62FE972E1EF2F12D0060EC74C3681; PHPSESSID=ukeo21oldb5pqsntu7kl8j3b96
URL: http://www.myameego.com/index2.php?do=login
How can I make a packet like the one above? any guidance would be greatly appreciated.
I Looked into that link you posted and the http sniffer shows that the POST request is being called but the cookie line doesn't match up with that of the manual browser request.
HttpURLConnection httpConnection = (HttpURLConnection)new URL("http://www.myameego.com/index2.php?do=login").openConnection();
httpConnection.setDoOutput(true);
httpConnection.setRequestMethod("POST");
httpConnection.setRequestProperty("Accept-Charset","UTF-8");
httpConnection.setRequestProperty("User-Agent","Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0; NP06)");
httpConnection.setRequestProperty("Content-Type","application/x-www-form-urlencoded;charset=UTF-8");
String info = String.format("user=%s&coled=%s",URLEncoder.encode("user","UTF-8"),URLEncoder.encode("coled","UTF-8"));
info += String.format("pass=%s&MYPASS=%s",URLEncoder.encode("pass","UTF-8"),URLEncoder.encode("MYPASS","UTF-8"));
info += String.format("clientID=%s&129=%s",URLEncoder.encode("clientID","UTF-8"),URLEncoder.encode("129","UTF-8"));
info += String.format("login=%s&Sign In=%s",URLEncoder.encode("login","UTF-8"),URLEncoder.encode("Sign In","UTF-8"));
httpConnection.setRequestProperty("Cookie",info);
OutputStream output = httpConnection.getOutputStream();
output.write(info.getBytes("UTF-8"));
int x;
while((x = httpConnection.getInputStream().read()) != -1)System.out.print((char)x);
my Cookie:
user=user&coled=coledpass=pass&MYPASS=MYPASSclientID=clientID&129=129login=login&Sign In=Sign+In
browsers cookie:
X-Mapping-fjhppofk=6A991610BA398B3A39F4B491D5382BB4; PHPSESSID=112tg9i4afau5i382hui705553
anyone know what I may be missing here?
With Jsoup this should be simple like this:
Connection.Response response = Jsoup.connect("http://www.myameego.com/index2.php?do=login")
.method(Connection.Method.GET)
.execute();
Document page = Jsoup.connect("http://www.myameego.com/index2.php?do=login")
.data("user", "login")
.data("pass", "password")
.data("clientID", "123456")
.cookies(response.cookies())
.post();
Gathered with Google Chrome Developer Tools

Urlencoding data for post request body. Am I using wrong charset?

I want to replicate a working POST request in Java. For testing purpose, lets take message like: 'äöõüäöõüäöõüäöõü'
Working POST request (with encoded message of 'äöõüäöõüäöõüäöõü'):
Header
POST http://www.mysite.com/newreply.php?do=postreply&t=477352 HTTP/1.1
Host: www.warriorforum.com
Connection: keep-alive
Content-Length: 403
Origin: http://www.mysite.com
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko)Chrome/14.0.835.202 Safari/535.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Accept: */*
Referer: http://www.mysite.com/test-forum/477352-test.html
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: bblastvisit=1319205053; bblastactivity=0; bbuserid=265374; bbpassword=1125e9ec1ab41f532ab8ec6f77ddaf94; bbsessionhash=91444317c100996990a04d6c5bbd8375;
Body
securitytoken=1319806096-618e5f9012901e2d818bf2c74c2121baa064be57&ajax=1&ajax_lastpost=1319806096&**message=%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC**&wysiwyg=0&styleid=1&signature=1&fromquickreply=1&s=&do=postreply&t=477352&p=who%20cares&specifiedpost=0&parseurl=1&loggedinuser=265374
As we can see in the request body 'äöõüäöõüäöõüäöõü is encoded as: %u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC
Now i want to replicate it.
Lets Url encode the text with charset utf-8 in Java:
String userText = "äöõüäöõüäöõüäöõü";
String encoded = URLEncoder.encode(userText, "utf-8");
Result: %C3%A4%C3%B6%C3%B5%C3%BC%C3%A4%C3%B6%C3%B5%C3%BC%C3%A4%C3%B6%C3%B5%C3%BC%C3%A4%C3%B6%C3%B5%C3%BC%0A%0A%0A%5BSIZE%3D%221%22%5D%5BI%5D << NOT THE SAME
Lets try ISO-8859-1:
String userText = "äöõüäöõüäöõüäöõü";
String encoded = URLEncoder.encode(userText, "ISO-8859-1");
Result: %E4%F6%F5%FC%E4%F6%F5%FC%E4%F6%F5%FC%E4%F6%F5%FC%0A%0A%0A%5BSIZE%3D%221%22%5D%5BI%5D << NOT THE SAME
Neither of them produce the same encoded string as in the working example, but all of them have the same input. What am I missing here?
%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC%u00E4%u00F6%u00F5%u00FC
I don't know what the above data is encoded as, but it isn't application/x-www-form-urlencoded; charset=UTF-8 as the request claims. This is not legal data for this MIME type.
It looks like some UTF-16BE-encoded form.
URLEncoder.encode(userText, "utf-8"); would be the correct way to encode the application/x-www-form-urlencoded; charset=UTF-8 values if this was actually what the server was expecting. (ref)

Categories

Resources