Is it possible to set cookies in htmlunit/webclient - java

I have this code but I couldn't find best solution for this. I have no idea on the cookie side. here is the code & Warning Cookie.
I'm trying to login for the website after that I use getPage to download a file then write it into my Directory.
String url = "https://www.frw.co.uk/login";
final WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_10);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
try {
HtmlPage login = webClient.getPage(url);//button.click();
final HtmlForm form = (HtmlForm) login.getElementById("loginForm");
form.getInputByName("j_username").setValueAttribute("example#hotmail.com");
form.getInputByName("j_password").setValueAttribute("examplepassword");
HtmlPage reslogin = form.getInputsByValue("Login").get(0).click();
//reslogin = webClient.getPage("https://www.frw.co.uk/excel/download");
HtmlPage downloadPage = reslogin.getAnchorByText("Download Wine List").click();
downloadPage = webClient.getPage("https://www.frw.co.uk/excel/download");
WebResponse response2 = downloadPage.getWebResponse();
InputStream is = response2.getContentAsStream();
OutputStream outputStream = new FileOutputStream(new File(fileTarget));
int read = 0;
byte[] bytes = new byte[1024];
while ((read = is.read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
}
outputStream.flush();
outputStream.close();
System.out.println("Cookie "+webClient.getCookieManager().getCookies());
System.exit(1);
} catch (Exception e) {
e.printStackTrace();
}
webClient.closeAllWindows();
WARNING RESULT:
WARNING: Invalid cookie header: "Set-Cookie: fixed_external_2276360454_end_user_id=; Domain=.optimizely.com; expires=Thu, 01 Jan 1970 00:00:00 GMT; Max-Age=-1". Negative max-age attribute: -1
WARNING: Cookie rejected: "[version: 0][name: end_user_id][value: oeu1447313832962r0.4258916180646556][domain: .2276360454.log.optimizely.com][path: /js][expiry: Sun Nov 09 15:37:28 CST 2025]". Illegal domain attribute "2276360454.log.optimizely.com". Domain of origin: "cdn.optimizely.com"

Related

Empty response with jersey 2 client

I'm using Jersey 2.16 client for fetching files,
some of the files are coming out empty when I try to parse the response.
For example, while trying to fetch URL:
https://s1.yimg.com/uu/api/res/1.2/3LJG5Qp6cO9WVZ644ybK1A--/YXBwaWQ9eXRhY2h5b247aD0xNjQ7dz0yOTA7/https://ibdp.videovore.com/video/61260788?size=512x288
The response status is 200, I see the content-length header stating there should be 9081 bytes, but the very first call to inputStream.read returns -1.
Following is the code that downloads the data:
private ByteArrayOutputStream downloadFile(Response response) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(1024);
try {
InputStream inputStream = response.readEntity(InputStream.class);
byte[] bytes = new byte[1024];
int readBytes = inputStream.read(bytes); // for the given URL this returns -1
while (readBytes > 0) {
outputStream.write(bytes, 0, readBytes);
readBytes = inputStream.read(bytes);
}
} catch (Exception e) {
e.printStackTrace();
}
return outputStream;
}
The response headers I get:
Server=ATS
Public-Key-Pins-Report-Only=max-age=2592000; pin-sha256="2fRAUXyxl4A1/XHrKNBmc8bTkzA7y4FB/GLJuNAzCqY="; pin-sha256="I/Lt/z7ekCWanjD0Cvj5EqXls2lOaThEA0H2Bg4BT/o="; pin-sha256="Wd8xe/qfTwq3ylFNd3IpaqLHZbh2ZNCLluVzmeNkcpw="; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="i7WTqTvh0OioIruIfFR4kMPnBqrS2rdiVPl/s2uC/CY="; pin-sha256="r/mIkG3eEpVdm+u/ko/cwxzOMo1bk4TyHIlByibiA5E="; pin-sha256="uUwZgwDOxcBXrQcntwu+kYFpkiVkOaezL0WYEZ3anJc="; pin-sha256="dolnbtzEBnELx/9lOEQ22e6OZO/QNb6VSSX2XHA3E7A="; includeSubdomains; report-uri="http://csp.yahoo.com/beacon/csp?src=yahoocom-hpkp-report-only"
Last-Modified=Sun, 30 Dec 2018 19:10:17 GMT
P3P=policyref="https://policies.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR ADM DEV TAI PSA PSD IVAi IVDi CONi TELo OTPi OUR DELi SAMi OTRi UNRi PUBi IND PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA POL HEA PRE LOC GOV"
Referrer-Policy=no-referrer-when-downgrade
Strict-Transport-Security=max-age=15552000
X-Server-Processor=ymagine
X-XSS-Protection=1; mode=block
Content-Length=9081
Age=11549
Content-Type=image/jpeg
X-Content-Type-Options=nosniff
Connection=keep-alive
X-Server-Time-FetchImage=89603
X-Server-Time-Process=3800
Date=Mon, 07 Jan 2019 08:36:25 GMT
Via=http/1.1 e30.ycpi.lob.yahoo.com (ApacheTrafficServer [cRs f ])
Cache-Control=public, max-age=86400
ETag="5c291819-6ec1"
Content-Disposition=inline; filename=61260788?size=512x288.jpg
X-Image-Height=163
X-Image-Width=290
X-Server-Time-Total=93975
Expect-CT=max-age=31536000, report-uri="http://csp.yahoo.com/beacon/csp?src=yahoocom-expect-ct-report-only"

Java - Get filename from "http://www.example.com/something.php?id=1111"

After googling, I found that file name is in Content-Disposition header field but this link does not has this header field. Here is the link
http://www.songspk.link/link/song.php?songid=5558
In web browser, above link redirects to
http://sound6.mp3slash.net/indian/mumbai_salsa/mumbaisalsa04%28www.songs.pk%29.mp3
The code I used :
URL url = new URL("http://www.songspk.link/link/song.php?songid=5558");
HttpURLConnection conn = null;
try {
conn = (HttpURLConnection) url.openConnection();
conn.setRequestProperty("User-Agent",
"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0");
conn.setRequestMethod("GET");
conn.setInstanceFollowRedirects(true);
Map<String, List<String>> map = conn.getHeaderFields();
Set<String> keys = map.keySet();
for (String s : keys) {
System.out.println(s);
System.out.println("--->" + map.get(s));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
conn.disconnect();
}
I checked all header fields and here is list
null
--->[HTTP/1.1 200 OK]
ETag
--->["98f85f68c5ddcf1:0"]
Date
--->[Wed, 23 Mar 2016 10:01:15 GMT]
Content-Length
--->[5777792]
Last-Modified
--->[Wed, 01 Oct 2014 22:16:54 GMT]
Accept-Ranges
--->[bytes]
Content-Type
--->[audio/mpeg]
X-Powered-By-Plesk
--->[PleskWin]
X-Powered-By
--->[ASP.NET]
Server
--->[Microsoft-IIS/7.5]
I need the original filename. I have no problem in using external library if it can solve my problem.
Just use getURL() method of the connection, it will return already redirected url:
System.out.println(conn.getURL());
Output:
http://sound6.mp3slash.net/indian/mumbai_salsa/mumbaisalsa04(www.songs.pk).mp3

Java - Download https page

I'm trying to download the content of a webpage with this code, but it does not get the same as Firefox.
URL url = new URL("https://jumpseller.cl/support/webpayplus/");
InputStream is = url.openStream();
Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);
When I check /tmp/asdfasdf it is not the html source code of the page, but just bytes (no text). But still, in Firefox I can see the webpage and its source code
How can I get the real webpage?
You need to examine the response headers. The page is compressed. The Content-Encoding header has a value of gzip.
Try this:
URL url = new URL("https://jumpseller.cl/support/webpayplus/");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
if ("gzip".equals(conn.getContentEncoding())) {
is = new GZIPInputStream(is);
}
Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);
Use HtmlUnit library and this code:
try(final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.waitForBackgroundJavaScript(5 * 1000);
HtmlPage page = webClient.getPage("https://jumpseller.cl/support/webpayplus/");
String stringToSave = page.asXml(); // It's a string with full HTML-code, if need you can save it to file.
webClient.close();
}

downloading files behind javascript button with htmlunit

I am trying to download something an attachment behind a javascript button with HtmlUnit. Performing other tasks works great (eg. navigating, login).
I checked out the attachment unit test but it didnt help me.
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_24);
final HtmlPage page1 = webClient.getPage( loginUrl );
final HtmlTextInput textField = page1.getElementByName(user);
final HtmlPasswordInput pwd = page1.getElementByName(pwd);
textField.setValueAttribute(User.getUsername());
pwd.setValueAttribute(User.getPassword());
final HtmlSubmitInput button = page1.getElementByName(login);
final HtmlPage page2 = button.click();
String buttonJavaScript = "window.location='"+folder+filename + ....... ";
ScriptResult result = page2.executeJavaScript(buttonJavaScript);
webClient.waitForBackgroundJavaScript(2000);
InputStream is = result.getNewPage().getWebResponse().getContentAsStream();
try {
File f = new File("filename.extension");
OutputStream os = new FileOutputStream(f);
byte[] bytes = new byte[1024];
while (read == is.read(bytes)) {
os.write(bytes, 0, read);
}
os.close();
is.close();
} catch (IOException ex) {
// Exception handling
}
However, it stops with:
runtimeError: message=[No node attached to this object] sourceName=[http://pagead2.googlesyndication.com/pagead/osd.js] line=[7] lineSource=[null] lineOffset=[0]
The file created is size 0.
There must be a way to get to the real file attached?!
Thank you in advance
Just in case anyone else is wondering: You need to use the AttachmentHandler.
ScriptResult result = page2.executeJavaScript(buttonJavaScript);
webClient.waitForBackgroundJavaScript(1000);
if( attachments.size() > 0 )
{
Attachment attachment = attachments.get(0);
Page attachedPage = attachment.getPage();
WebResponse attachmentResponse = attachedPage.getWebResponse();
String content = attachmentResponse.getContentAsString();
... write(content);
}

Download a file from a page (cookies needed) using post request

I have tested the first step (the login page) and it works. I put all parameters (user, pass, etc) and I can print the result (page with my data). The problem is when I try to download a file from that web. I need the cookies from the first step. In the file that I download I have the message: "Expired session". This is my code:
URL login = new URL("...");
URL download_page = new URL("...");
URL document_link new URL("...");
//String for request
String data_post = "username=name&password=1234&other_data=...";
//Login page
HttpURLConnection conn = (HttpURLConnection)login.openConnection();
conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(data_post);
wr.close();
conn.connect();
//Download page
HttpURLConnection connDownload = (HttpURLConnection)download_page.openConnection();
connDownload.connect();
//Link to the file
HttpURLConnection connFile = (HttpURLConnection)document_link.openConnection();
connFile.connect();
BufferedInputStream in = new BufferedInputStream(connFile.getInputStream());
File saveFile = new File("myfile.txt");
OutputStream out = new BufferedOutputStream(new FileOutputStream(saveFile));
byte[] buf = new byte[256];
int n = 0;
while ((n=in.read(buf))>=0) {
out.write(buf, 0, n);
}
out.flush();
out.close();
Thanks in advance.
Have you tried to check the headers for a cookie on the first page before closing the connection? I'd try something like:
String cookies = conn.getHeaderField("Set-Cookie");
Then set the cookie subsequently in the following connections, before executing connect(), using:
connDownload.setRequestProperty("Cookie", cookies);
... See if that works ...

Categories

Resources