Error Accessing Website - java

I am trying to get the html of MyAnimeList.net (specifically this page: http://myanimelist.net/anime.php?q=toradora!), I am using a method that has worked for me before on different websites but doesn't work for me here.
The Method I use:
public String getWebsiteSourceCode(String sURL){
try{
URL url = new URL(sURL);
URLConnection urlConn= url.openConnection();
//NEW LINE
urlConn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36");
BufferedReader in = new BufferedReader(new InputStreamReader(
urlConn.getInputStream(), "UTF-8"));
String inputLine;
StringBuilder a = new StringBuilder();
while ((inputLine = in.readLine()) != null)
a.append(inputLine);
in.close();
return a.toString();
}catch(Exception e){
e.printStackTrace();
return "null";
}
}
What I get:
<html><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"></head><iframe src="/_Incapsula_Resource?CWUDNSAI=9&incident_id=124000930038292057-125560654487356886&edet=12&cinfo=464f095fc75381e904000000" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 124000930038292057-125560654487356886</iframe></html>
What I should be getting: The html code of the webpage (I can get it in google chrome by right click + view page source, and it is completely different from what I get from my method).
From what I get, it says something about ROBOTS, so I assume the website has cookies or something to track whether I am using a browser or a bot... What I want to know is whether or not it is possible to bypass this, and how would I go upon doing so? Thanks for your help :) (Preferably in Java, since that is what I am using)
EDIT: tried adding this line:
urlConn.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.102 Safari/537.36");
but I get the same error...

try
URL url = new URL(sURL);
URLConnection urlConn= url.openConnection();
c.setRequestProperty("User-Agent", "....");
For an idea of what to put in ... have a look at http://www.whatsmyuseragent.com/
Using your code I get
<html><head><META NAME="robots" CONTENT="noindex,nofollow"><script>(function(){f
unction getSessionCookies(){cookieArray=new Array();var cName=/^\s?incap_ses_/;v
ar c=document.cookie.split(";");for(var i=0;i<c.length;i++){key=c[i].substr(0,c[
i].indexOf("="));value=c[i].substr(c[i].indexOf("=")+1,c[i].length);if(cName.tes
t(key)){cookieArray[cookieArray.length]=value}}return cookieArray}function setIn
capCookie(vArray){try{cookies=getSessionCookies();digests=new Array(cookies.leng
th);for(var i=0;i<cookies.length;i++){digests[i]=simpleDigest((vArray)+cookies[i
])}res=vArray+",digest="+(digests.join())}catch(e){res=vArray+",digest="+(encode
URIComponent(e.toString()))}createCookie("___utmvc",res,20)}function simpleDiges
t(mystr){var res=0;for(var i=0;i<mystr.length;i++){res+=mystr.charCodeAt(i)}retu
rn res}function createCookie(name,value,seconds){if(seconds){var date=new Date()
;date.setTime(date.getTime()+(seconds*1000));var expires="; expires="+date.toGMT
String()}else{var expires=""}document.cookie=name+"="+value+expires+"; path=/"}f
unction test(o){var res="";var vArray=new Array();for(test in o){switch(o[test])
{case"exists":try{vArray[vArray.length]=encodeURIComponent(test+"="+typeof(eval(
test)))}catch(e){vArray[vArray.length]=encodeURIComponent(test+"="+e)}break;case
"value":try{vArray[vArray.length]=encodeURIComponent(test+"="+eval(test).toStrin
g())}catch(e){vArray[vArray.length]=encodeURIComponent(test+"="+e)}break;case"pl
ugins":try{p=navigator.plugins;pres="";for(a in p){pres+=(p[a]["description"]+"
").substring(0,20)}vArray[vArray.length]=encodeURIComponent("plugins="+pres)}cat
ch(e){vArray[vArray.length]=encodeURIComponent("plugins="+e)}break;case"plugin":
try{a=navigator.plugins;for(i in a){f=a[i]["filename"].split(".");if(f.length==2
){vArray[vArray.length]=encodeURIComponent("plugin="+f[1]);break}}}catch(e){vArr
ay[vArray.length]=encodeURIComponent("plugin="+e)}break}}vArray=vArray.join();re
turn vArray}var o={navigator:"exists","navigator.vendor":"value",opera:"exists",
ActiveXObject:"exists","navigator.appName":"value",platform:"plugin",webkitURL:"
exists","navigator.plugins.length==0":"value"};try{setIncapCookie(test(o));docum
ent.createElement("img").src="/_Incapsula_Resource?SWKMTFSR=1&e="+Math.random()}
catch(e){img=document.createElement("img");img.src="/_Incapsula_Resource?SWKMTFS
R=1&e="+e}})();</script><script>(function() { var z="";var b="7472797B7661722078
68723B76617220743D6E6577204461746528292E67657454696D6528293B76617220737461747573
3D227374617274223B7661722074696D696E673D6E65772041727261792833293B77696E646F772E
6F6E756E6C6F61643D66756E6374696F6E28297B74696D696E675B325D3D22723A222B286E657720
4461746528292E67657454696D6528292D74293B646F63756D656E742E637265617465456C656D65
6E742822696D6722292E7372633D222F5F496E63617073756C615F5265736F757263653F4553324C
555243543D363726743D373826643D222B656E636F6465555249436F6D706F6E656E742873746174
75732B222028222B74696D696E672E6A6F696E28292B222922297D3B69662877696E646F772E584D
4C4874747052657175657374297B7868723D6E657720584D4C48747470526571756573747D656C73
657B7868723D6E657720416374697665584F626A65637428224D6963726F736F66742E584D4C4854
545022297D7868722E6F6E726561647973746174656368616E67653D66756E6374696F6E28297B73
7769746368287868722E72656164795374617465297B6361736520303A7374617475733D6E657720
4461746528292E67657454696D6528292D742B223A2072657175657374206E6F7420696E69746961
6C697A656420223B627265616B3B6361736520313A7374617475733D6E6577204461746528292E67
657454696D6528292D742B223A2073657276657220636F6E6E656374696F6E2065737461626C6973
686564223B627265616B3B6361736520323A7374617475733D6E6577204461746528292E67657454
696D6528292D742B223A2072657175657374207265636569766564223B627265616B3B6361736520
333A7374617475733D6E6577204461746528292E67657454696D6528292D742B223A2070726F6365
7373696E672072657175657374223B627265616B3B6361736520343A7374617475733D22636F6D70
6C657465223B74696D696E675B315D3D22633A222B286E6577204461746528292E67657454696D65
28292D74293B6966287868722E7374617475733D3D323030297B706172656E742E6C6F636174696F
6E2E72656C6F616428297D627265616B7D7D3B74696D696E675B305D3D22733A222B286E65772044
61746528292E67657454696D6528292D74293B7868722E6F70656E2822474554222C222F5F496E63
617073756C615F5265736F757263653F535748414E45444C3D353238343936313938333732343733
393534322C3239343135383533343939333730393439362C31313138373735393633303935393534
323637302C3339333836222C66616C7365293B7868722E73656E64286E756C6C297D636174636828
63297B7374617475732B3D6E6577204461746528292E67657454696D6528292D742B2220696E6361
705F6578633A20222B633B646F63756D656E742E637265617465456C656D656E742822696D672229
2E7372633D222F5F496E63617073756C615F5265736F757263653F4553324C555243543D36372674
3D373826643D222B656E636F6465555249436F6D706F6E656E74287374617475732B222028222B74
696D696E672E6A6F696E28292B222922297D3B";for (var i=0;i<b.length;i+=2){z=z+parseI
nt(b.substring(i, i+2), 16)+",";}z = z.substring(0,z.length-1); eval(eval('Strin
g.fromCharCode('+z+')'));})();</script></head><body><iframe style="display:none;
visibility:hidden;" src="http://my.incapsula.com/public/ga/jsTest.html" id="gaIf
rame"></iframe></body></html>

Related

Decode request from API call

I want to make a API call using graphql:
URL url = new URL("https://www.some_web_site.com/voyager/api/graphql?variables=(jobCardPrefetchQuery:(jobUseCase:JOB_DETAILS,prefetchJobPostingCardUrns:List(urn%3Ali%3Afsd_jobPostingCard%3A%283381613144%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283309215638%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283390173915%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283384739773%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283349746057%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283399364227%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283346758701%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283339724174%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283394131711%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283376993869%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283367203416%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283391072254%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283378145979%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283386882674%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283363291070%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283379552483%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283384379850%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283384189666%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283357674221%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%282921934527%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283360994137%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283400914209%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283322816290%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283389955425%2CJOB_DETAILS%29,urn%3Ali%3Afsd_jobPostingCard%3A%283385984792%2CJOB_DETAILS%29)))&&queryId=voyagerJobsDashJobCards.a2332c3024c06e104d995060b03b3e43");
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
httpConn.setRequestMethod("GET");
httpConn.setRequestProperty("authority", "www.some_web_site.com");
httpConn.setRequestProperty("accept", "application/vnd.some_web_site.normalized+json+2.1");
httpConn.setRequestProperty("accept-language", "en-US,en;q=0.9");
httpConn.setRequestProperty("cookie", "bcookie=\"v=2&6fc50d25-9d9c-4c9c-8e15-cb85a0d0487b\"; bscookie=\"v=1&20210323132219381954c4-ad82-44b3-8d2c-22095b879042AQGEhxQJj6yW11ERtduVNxs5rtVSAMBp\"; li_rm=AQGGLfZ06r63ZgAAAXsWmDpJhcTdHaPCFAvu2x_4WmdidiGZZgHI2VCkG9Vcm15E52dFwfTRZjD7-y6fFfZ3os-ouUFUUMAP9zUFRXtpppob6Jsk-5jywhWJ; _ga=GA1.2.978357381.1628171751; timezone=Europe/Sofia; aam_uuid=25646872215935551520582305049506167860; _gcl_au=1.1.798106900.1628171779; li_theme=system; li_alerts=e30=; __ssid=4461b759-d6fd-43a8-9aa6-6170e1b52f81; li_theme_set=app; VID=V_2022_01_19_10_1493; G_ENABLED_IDPS=google; li_gc=MTsyMTsxNjQ4NDI4MjQ1OzI7MDIxb+aBjuH+4tvdvwmua4v9UqyRyzc5sIUB38QjcHnEqQY=; visit=v=1&M; liap=true; JSESSIONID=\"ajax:3454879913095470865\"; s_fid=1B1937DEEC2C6D37-2F033B1588EC1876; mbox=PC#830b22e2c1204684880490b80b06cb60.34_0#1682989647|session#6f6b357e1c5b44d1ae3c653b5c83f989#1667439507; s_ips=937; AnalyticsSyncHistory=AQKuwzqgAo50uAAAAYUIpWEC9kS13rXinz-XQWIXmd4Fq4Glmch0GzNnqIVIJlRwZMhL95WX5GCfYDJE2sCHMw; lms_ads=AQGr99ghK-2XugAAAYUIpWHGhGJwOoz_n5sH3AoFmuwV4_X1HGg0oD9yS7BdMBufU7J1lFVWiLUsMINxUT9QnPJh0dCuicrl; lms_analytics=AQGr99ghK-2XugAAAYUIpWHGhGJwOoz_n5sH3AoFmuwV4_X1HGg0oD9yS7BdMBufU7J1lFVWiLUsMINxUT9QnPJh0dCuicrl; li_at=AQEDARqyax8BofFVAAABhQuvuGkAAAGFL7w8aU0Aq95MvKsqbsob115un5v2PCrAkDC55M2vjdQiOXIekRn__kFf4G3e8KbfOR4xHerEh33Y4o77t9HPCfG6eKAICbmbiWB6UzSjxy5ieOuLy_I05Zgm; lang=v=2&lang=en-us; AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg=1; lil-lang=en_US; s_cc=true; gpv_pn=www.linkedin.com%2Flearning%2Fjava-ee-servlets-and-javaserver-pages-jsp; s_tp=3337; s_plt=4.10; s_pltp=www.linkedin.com%2Flearning%2Fjava-ee-servlets-and-javaserver-pages-jsp; s_ppv=www.linkedin.com%2Flearning%2Fjava-ee-servlets-and-javaserver-pages-jsp%2C67%2C28%2C2237%2C2%2C3; s_tslv=1671032611869; AMCV_14215E3D5995C57C0A495C55%40AdobeOrg=-637568504%7CMCIDTS%7C19340%7CMCMID%7C25469860276152368910638363702655569919%7CMCAAMLH-1671655210%7C6%7CMCAAMB-1671655210%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1671057610s%7CNONE%7CMCCIDH%7C-173138152%7CvVersion%7C5.1.1; UserMatchHistory=AQIkSvhSMBMwDAAAAYUSkHzxlcAhQiwkqVW-fnoKFql17EjfqYUVpnT9u_X3_wgEKxbgMP0NKPdWcqy4JyBytaa8r7aBuSeZ52xEM5ViomuA8YbDshzpGmWcjbySw4ifRsckhAY1PeMzxtWWCv3cO84T4Aii8VdzlvIdnF3YkuO8zxtroIYtVJoi1tzxAaDMhRy3xKoUoFQruT7B0iyg61cyguKbBgpUPS_DNWjviIET8kbC-ZdINHtUfodXMxtYmz6a2YOYivvIE2Qfwfy97wA8YCMB-rWKObet_sk; lidc=\"b=VB47:s=V:r=V:a=V:p=V:g=3532:u=864:x=1:i=1671053737:t=1671134164:v=2:sig=AQF22ngB_c7HxFphw3OcQBCH-Z1vEk3\"; li_mc=MTsy33sxNjcxMDU0MDYwOzI7MDIxWvtOdIZ8/F4IRqSFMX7wJYOWOToT2pAAzOP8rEr5pyA=; sdsc=22%3A1%2C1671054185039%7EJAPP%2C0Z6uuB8klhAtG9A5iEG19Pz7UGyA%3D");
httpConn.setRequestProperty("csrf-token", "ajax:3454129913095170865");
httpConn.setRequestProperty("referer", "https://www.some_web_site.com/jobs/search/?currentJobId=1332613144&geoId=103829153&keywords=java&location=Norway&refresh=true&start=950");
httpConn.setRequestProperty("sec-ch-ua", "\"Not?A_Brand\";v=\"8\", \"Chromium\";v=\"108\", \"Google Chrome\";v=\"108\"");
httpConn.setRequestProperty("sec-ch-ua-mobile", "?0");
httpConn.setRequestProperty("sec-ch-ua-platform", "\"Windows\"");
httpConn.setRequestProperty("sec-fetch-dest", "empty");
httpConn.setRequestProperty("sec-fetch-mode", "cors");
httpConn.setRequestProperty("sec-fetch-site", "same-origin");
httpConn.setRequestProperty("user-agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36");
httpConn.setRequestProperty("x-li-lang", "en_US");
httpConn.setRequestProperty("x-li-page-instance", "urn:li:page:d_flagship3_search_srp_jobs;HsFe0k8FR5a+hftD/cGHXA==");
httpConn.setRequestProperty("x-li-track", "{\"clientVersion\":\"1.11.5479\",\"mpVersion\":\"1.11.5479\",\"osName\":\"web\",\"timezoneOffset\":2,\"timezone\":\"Europe/Sofia\",\"deviceFormFactor\":\"DESKTOP\",\"mpName\":\"voyager-web\",\"displayDensity\":1,\"displayWidth\":1920,\"displayHeight\":1080}");
httpConn.setRequestProperty("x-restli-protocol-version", "2.0.0");
InputStream responseStream = httpConn.getResponseCode() / 100 == 2
? httpConn.getInputStream()
: httpConn.getErrorStream();
Scanner s = new Scanner(responseStream).useDelimiter("\\A");
String response = s.hasNext() ? s.next() : "";
System.out.println(response);
But for some reason I get unreadable response when the call is made using the above Java code:
xlIjpudWxsLCJncm91cE5hbWUiOm51bGwsImh5cGVybGlua09wZW5FeHRlcm5hbGx5IjpudWxsLCJsaXN0U3R5bGUiOm51bGwsInByb2ZpbGVGdWxsTmFtZSI6bnVsbCwic3RyaW5nRmllbGRSZWZlcmVuY2UiOm51bGwsImxlYXJuaW5nQ291cnNlTmFtZSI6bnVsbCwicHJvZmlsZU1lbnRpb24iOm51bGwsInN0eWxlIjoiTElTVF9JVEVNIiwic2Nob29sTmFtZSI6bnVsbCwiaGFzaHRhZyI6bnVsbH0sIiRyZWNpcGVUeXBlcyI6WyJjb20ubGlua2VkaW4uYTg2ZWMzYTdhYjUwYzJjODAxZTAzOTVhODZmZjE5YzAiXSwiJHR5cGUiOiJjb20ubGlua2VkaW4udm95YWdlci5kYXNoLmNvbW1vbi50ZXh0LlRleHRBdHRyaWJ1dGUifSx7InN0YXJ0IjoxMzQzLCJsZW5ndGgiOjMwNSwiZGV0YWlsRGF0YSI6eyJqb2JQb3N0aW5nTmFtZSI6bnVsbCwiaHlwZXJsaW5rIjpudWxsLCJwcm9maWxlRmFtaWxpYXJOYW1lIjpudWxsLCJjb2xvciI6bnVsbCwiY29tcGFueU5hbWUiOm51bGwsImljb24iOm51bGwsImVwb2NoIjpudWxsLCJzeXN0ZW1JbWFnZSI6bnVsbCwibGlzdEl0ZW1TdHlsZSI6bnVsbCwiZ3JvdXBOYW1lIjpudWxsLCJoeXBlcmxpbmtPcGVuRXh0ZXJuYWxseSI6bnVsbCwibGlzdFN0eWxlIjpudWxsLCJwcm9maWxlRnVsbE5hbWUiOm51bGwsInN0cmluZ0ZpZWxkUmVmZXJlbmNl
When I make the same call in curl format using Postman I get a readable JSON response. Do you know why in IntelliJ I get this unreadable response? It should be JSON? How can I solve the problem?
EDIT:
This is caused by IntelliJ bug: https://youtrack.jetbrains.com/issue/KTIJ-22158/JUnit5-test-prints-a-cryptic-AssertionError-message-in-the-IDE?s=JUnit5-test-prints-a-cryptic-AssertionError-message-in-the-IDE

Java URL Connection throws IOException with 403, works perfectly in browser

so I am currently trying to download the html of a website, however I ran into this problem where one website constantly gives me 403 back. I've already had that error in previous projects, and were always able to fix it with adding a User-Agent, however this time, nothing I tried helped. I even copied every single part of my header in my browser, but I still get 403 in Java, while it works perfectly with wget, or other programming languages. Maybe someone here can help me?
URL im trying to download is: here
I'm using the following code (I've copied them 1:1 from my request in firefox):
if (file.exists()) {
Files.delete(file.toPath());
}
HttpURLConnection httpcon = (HttpURLConnection) url.openConnection();
httpcon.setRequestMethod("GET");
httpcon.setRequestProperty("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8");
httpcon.setRequestProperty("Accept-Encoding", "gzip, deflate, br");
httpcon.setRequestProperty("Accept-Language", "en-GB,en;q=0.5");
httpcon.setRequestProperty("Cache-Control", "max-age=0");
httpcon.setRequestProperty("Connection", "keep-alive");
httpcon.setRequestProperty("Host", "www.mediamarkt.de");
httpcon.setRequestProperty("Sec-Fetch-Dest", "document");
httpcon.setRequestProperty("Sec-Fetch-Mode", "navigate");
httpcon.setRequestProperty("Sec-Fetch-Site", "none");
httpcon.setRequestProperty("Sec-Fetch-User", "?1");
httpcon.setRequestProperty("TE", "trailers");
httpcon.setRequestProperty("Upgrade-Insecure-Requests", "1");
httpcon.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:97.0) Gecko/20100101 Firefox/97.0");
InputStream is = httpcon.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
BufferedWriter bwr = new BufferedWriter(new FileWriter(file, true));
String line;
while ((line = br.readLine()) != null) {
bwr.write(line);
}
is.close();
br.close();
bwr.close();

Get page html in Java from specific url that needs Cookie

I am trying to get the html source of
https://www.coinbet24.com/en/odds/football/algeria/ligue-1
In general, I have done this tons of times, and never had a problem, yet this specific website is giving me a hard time.
No matter what I try, I get a response with populated head, but an empty body.
The only time that it works and I actually get the full response, is if I manually set the Cookie in the request header to be equal to the Cookie of my actual browser.
I tried automating this process by first getting the connection headers and setting the Cookie through those, but once again, I am getting a blank body.
This is how I get the Cookie, then set it for the request. I also tried with Apache HttpClient. Same result.
URL url = new URL(urlStr);
URLConnection connection = url.openConnection();
connection.addRequestProperty("User-Agent",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36");
Map<String, List<String>> headers = connection.getHeaderFields();
connection = url.openConnection();
String cookie =
headers.get("Set-Cookie").get(0).split(";")[0] + "; " + headers.get("Set-Cookie").get(1).split(";" + "")[0];
System.out.println("cookie = " + cookie);
connection.addRequestProperty("User-Agent",
"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.101 Safari/537.36");
connection.addRequestProperty("Cookie", cookie);
BufferedReader br = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder sb = new StringBuilder();
String str;
while ((str = br.readLine()) != null) {
sb.append(str);
}
return sb.toString();
Any help is appreciated. Thanks in advance.

How do I get the same html code from a java request as I do from inspect in Chrome?

I'm trying to get the stream link for a video that is embeded in a website. Firstly I get the html from the website containg the player. Then refine this to the embedded link and then from that i get the stream link. In the past when i have done this I have been able to use Chrome to find the video player element then look for it in Java. However, when i look for the component i found from chrome it is not in the html code i get from Java.
(this method has worked in the past with different websites)
I'm using Inspect Element in chrome to find the player
This is my code to find an element of a website in Java:
//Opens Connection
URL url = new URL(address);
//Gets Data
URLConnection connection = url.openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36");
InputStream is = connection.getInputStream();
//Creates bufferd reader
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String inputLine;
//Finds the line
while ((inputLine = in.readLine()) != null) {
if (inputLine.contains(target) == true) {
break;
}
}
//Closes The input stream and buffered reader
in.close();
is.close();
//Returns the found line
return inputLine;
Any help is appreciated.

Apache-HttpComponents: socket closed error

I am writing a Java program which uses Apache-HttpComponents to load a page and prints its HTML to the console; however, the program only prints part of the HTML before throwing this error: Exception in thread "main" java.net.SocketException: socket closed. The portion of the HTML displayed before the exception is exactly the same every time I run the program, and the error occurs in this simplified example with Google, Yahoo and Craigslist:
String USERAGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22";
DefaultHttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet("http://www.craigslist.org");
get.setHeader(HTTP.USER_AGENT,USERAGENT);
HttpResponse page = client.execute(get);
get.releaseConnection();
InputStream stream = page.getEntity().getContent();
try{
BufferedReader br = new BufferedReader(new InputStreamReader(stream));
String line = "";
while ((line = br.readLine()) != null){
System.out.println(line);
}
}
finally{
EntityUtils.consume(page.getEntity());
}
I've found that get.releaseConnection(); should not be called until after I've finished reading the HTML. Calling it immediately after EntityUtils.consume(page.getEntity()); fixes the above code.

Categories

Resources