I'm trying to download the content of a webpage with this code, but it does not get the same as Firefox.
URL url = new URL("https://jumpseller.cl/support/webpayplus/");
InputStream is = url.openStream();
Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);
When I check /tmp/asdfasdf it is not the html source code of the page, but just bytes (no text). But still, in Firefox I can see the webpage and its source code
How can I get the real webpage?
You need to examine the response headers. The page is compressed. The Content-Encoding header has a value of gzip.
Try this:
URL url = new URL("https://jumpseller.cl/support/webpayplus/");
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
if ("gzip".equals(conn.getContentEncoding())) {
is = new GZIPInputStream(is);
}
Files.copy(is, Paths.get("/tmp/asdfasdf"), StandardCopyOption.REPLACE_EXISTING);
Use HtmlUnit library and this code:
try(final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.waitForBackgroundJavaScript(5 * 1000);
HtmlPage page = webClient.getPage("https://jumpseller.cl/support/webpayplus/");
String stringToSave = page.asXml(); // It's a string with full HTML-code, if need you can save it to file.
webClient.close();
}
Related
I am trying to download a file from this url, but the code hang at getInputStream();
I type this url in the browser. the url is accessible
http://filehost.blob.core.windows.net/firmware/version.txt
What is the cause of it ?
URL url = new URL("http://filehost.blob.core.windows.net/firmware/version.txt");
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
urlConnection.setRequestMethod("GET");
urlConnection.setDoOutput(true);
urlConnection.connect();
InputStream inputStream = urlConnection.getInputStream(); //hang at this line
int totalSize = urlConnection.getContentLength();
READING THE FILE CONTENT
SOLUTION
Use URL with Scanner.
CODE
URL url = new URL("http://filehost.blob.core.windows.net/firmware/version.txt");
Scanner s = new Scanner(url.openStream());
while (s.hasNextLine())
System.out.println(s.nextLine());
s.close();
OUTPUT
1.016
NOTE MalformedURLException and IOException must be thrown or handled.
DOWNLOADING THE FILE
SOLUTION
Use JAVA NIO.
CODE
URL website = new URL("http://filehost.blob.core.windows.net/firmware/version.txt");
ReadableByteChannel rbc = Channels.newChannel(website.openStream());
FileOutputStream fos = new FileOutputStream("C:/temp/version.txt");
fos.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
fos.close();
OUTPUT file has been created at c:\test\version.txt with 5 bytes size
NOTE MalformedURLException, FileNotFoundException and IOException must be thrown or handled.
I tried your code snippet and could not reproduce your problem - it does not hang for me. I think that your network (configuration) may have some problems and that your code hangs until some timeout occurs.
I want to hit a URL in php from java servlet.I just want to send a information to that url.I didn't need to go to that url.i want to stay in my page.Anyway to do that.?
You could simply make a post on that url like in the http components doc
PostMethod post = new PostMethod("http://jakarata.apache.org/");
NameValuePair[] data = {
new NameValuePair("user", "joe"),
new NameValuePair("password", "bloggs")
};
post.setRequestBody(data);
// execute method and handle any error responses.
...
InputStream in = post.getResponseBodyAsStream();
// handle response.
Try following code
URL url = new URL("your url");
URLConnection connection = url.openConnection();
connection.setConnectTimeout(5000); // time out
connection.setDoOutput(true);
PrintWriter out = new PrintWriter(new OutputStreamWriter(connection.getOutputStream()));
String postData = "your post data";
out.print(postData);
out.close();
String response connection.getInputStream();
or you can use
Request.Post("your url").bodyForm(
Form.form().add("parameter name", "value")
.add("parameter name1", "value").build()).execute();
I'm trying to download a file from a given URL which may or may not be a direct link to the file. Does anyone know how I can detect the filename to write to if the URL is an indirect link (i.e http://www.example.com/download.php?getFile=1) ?
It is no problem if the URL is a direct link to extract the filename from the URL and start writing to the extracted filename but with a redirect link the only method I have found so far is to write to an arbitrary filename - foo.txt - and then try and work with that. Problem is I really need the filename (and extension) to be correct.
A sample of the code I am using is: (the section in the 'else' clause is neither finished nor working):
public static boolean dlFile(String URL, String dest){
try{
URL grab = new URL(URL);
ReadableByteChannel rbc = Channels.newChannel(grab.openStream());
String fnRE = ".*/([a-zA-Z0-9\\-\\._]+)$";
Pattern pattern = Pattern.compile(fnRE);
Matcher matcher = pattern.matcher(URL);
String fName = "";
if(matcher.find()) fName = matcher.group(1);
else { //filename cannot be extracted - do something here - below doesn't work raises MalformedURLExcpetion
URL foo = new URL(URL);
HttpURLConnection fooConnection = (HttpURLConnection) foo.openConnection();
URL secondFoo = new URL(fooConnection.getHeaderField("Location"));
System.out.println("Redirect URL: "+secondFoo);
fooConnection.setInstanceFollowRedirects(false);
URLConnection fooURL = secondFoo.openConnection();
}
System.out.println("Connection to "+URL+" established!");
if(dest.endsWith("/")){}
else dest+="/";
System.out.println("Writing "+fName+" to "+dest);
FileOutputStream fos = new FileOutputStream(dest+fName);
fos.getChannel().transferFrom(rbc, 0, 1 << 24);
I am sure there must be a simple way to get the filename from the headers or something like that but I cannot work out how to get it. Thanks in advance,
Assuming the response has a "Location" header field, I was able to obtain the direct link to a url containing multiple redirects like this:
String location = "http://www.example.com/download.php?getFile=1";
HttpURLConnection connection = null;
for (;;) {
URL url = new URL(location);
connection = (HttpURLConnection) url.openConnection();
connection.setInstanceFollowRedirects(false);
String redirectLocation = connection.getHeaderField("Location");
if (redirectLocation == null) break;
location = redirectLocation;
}
//and finally:
String fileName = location.substring(location.lastIndexOf('/') + 1, location.length());
I think its better to use Java Jsoup library, then use the below method:
public static void downloadFileJsoup(String URL, String PATH) throws IOException {
Response res = Jsoup.connect(URL)
.userAgent("Mozilla")
.timeout(30000)
.followRedirects(true)
.ignoreContentType(true)
.maxBodySize(20000000)//Increase value if download is more than 20MB
.execute();
String remoteFilename=res.header("Content-Disposition").replaceFirst("(?i)^.*filename=\"?([^\"]+)\"?.*$", "$1");
String filename = PATH + remoteFilename;
FileOutputStream out = (new FileOutputStream(new java.io.File(filename)));
out.write( res.bodyAsBytes());
out.close();
}
No, in general no way. The response does'nt contain that information normally, since you do not add any own protocol information to the data stream (in case you can control the server).
Anyway, you ask for the file name extension. Maybe with the correct content-type you are done.
I have tested the first step (the login page) and it works. I put all parameters (user, pass, etc) and I can print the result (page with my data). The problem is when I try to download a file from that web. I need the cookies from the first step. In the file that I download I have the message: "Expired session". This is my code:
URL login = new URL("...");
URL download_page = new URL("...");
URL document_link new URL("...");
//String for request
String data_post = "username=name&password=1234&other_data=...";
//Login page
HttpURLConnection conn = (HttpURLConnection)login.openConnection();
conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(data_post);
wr.close();
conn.connect();
//Download page
HttpURLConnection connDownload = (HttpURLConnection)download_page.openConnection();
connDownload.connect();
//Link to the file
HttpURLConnection connFile = (HttpURLConnection)document_link.openConnection();
connFile.connect();
BufferedInputStream in = new BufferedInputStream(connFile.getInputStream());
File saveFile = new File("myfile.txt");
OutputStream out = new BufferedOutputStream(new FileOutputStream(saveFile));
byte[] buf = new byte[256];
int n = 0;
while ((n=in.read(buf))>=0) {
out.write(buf, 0, n);
}
out.flush();
out.close();
Thanks in advance.
Have you tried to check the headers for a cookie on the first page before closing the connection? I'd try something like:
String cookies = conn.getHeaderField("Set-Cookie");
Then set the cookie subsequently in the following connections, before executing connect(), using:
connDownload.setRequestProperty("Cookie", cookies);
... See if that works ...
I have a URLConnection, that access a web page.
URL url = new URL("https://domain");
con = url.openConnection();
con.setDoOutput(true);
Then i sent some data to the server using con.setRequestProperty()
I get the response cookies fro ma specified field using
String headerValue = con.getHeaderField(6);
I also get the html and parse an image url from there. But here is a problem. I can get this image only by sending cache data back to the server ,when i acces my image.
So i open a new connection
URL url1 = new URL("https://domain/image);
URLConnection con1 = url1.openConnection();
I send the cookies back to the server con1.setRequestProperty("Cookie", headerValue);
And finally i try to acces the image using BufferedInputStream and then creating an iamge in a JLabel
BufferedInputStream in = new BufferedInputStream(con1.getInputStream());
ByteArrayOutputStream byteArrayOut = new ByteArrayOutputStream();
int c;
while ((c = in.read()) != -1) {
byteArrayOut.write(c);
}
Image image = Toolkit.getDefaultToolkit().createImage(
byteArrayOut.toByteArray());
label.setIcon(new ImageIcon(image));
The problem is this seems to not work. Is it another way to get an file from a server through a URlConnection?
Error code
Server returned HTTP response code: 400 for URL: https://domain/image
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
Thanks in advance
Found the error. Case closed.
Used this code to split the cookies string.
String temp = headerValue.substring(0, (len1-17));