I want to download the html source code of a site to parse some info. How do I accomplish this in Java?
Just attach a BufferedReader (or anything that reads strings) from a URL's InputStream returned from openStream().
public static void main(String[] args)
throws IOException
{
URL url = new URL("http://stackoverflow.com/");
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
String s = null;
while ((s = reader.readLine()) != null)
System.out.println(s);
}
You can use the Java classes directly:
URL url = new URL("http://www.example.com");
URLConnection conn = url.openConnection();
InputStream in = conn.getInputStream();
...
but it's more recommended to use Apache HttpClient as HttpClient will handle a lot of things that you'll have to do yourself with the Java native classes.
Related
I'm trying to upload an image with JAVA to a self-hosted ActiveCollab.
I have made a couple of tests and this one seems for me like the most solid of them by far. Anyway, when I try to run it, I get code 200-OK and an empty array as a response. .
public static void main(String args[]) throws IOException {
URL url = new URL("<SITE>/api/v1/upload-files");
HttpURLConnection c = (HttpURLConnection) url.openConnection();
c.setDoOutput(true);
c.setRequestProperty("Content-Type", "multipart/form-data");
c.setRequestProperty("X-Angie-AuthApiToken", "<TOKEN>");
JSONArray array = new JSONArray();
array.put("/test.png");
array.put("image/png");
OutputStream out = c.getOutputStream();
out.write(array.toString().getBytes());
out.flush();
out.close();
BufferedReader buf = new BufferedReader(new InputStreamReader(c.getInputStream()));
StringBuffer response = new StringBuffer();
String line;
while (null != (line = buf.readLine())) {
response.append(line);
}
JSONArray message = new JSONArray(response.toString());
System.out.println(message);
}
In the API documentation I should get a filled json array as a response. Actually I don't know at what I'm missing.
Finally I solved it! As #StephanHogenboom said, the problem were in multipart/form-data, the parameters had to be introduced there and not via JSONArray. I didn't find so much information about how to work with multipart in java.net but at least I've found a deprecaded but functional way to the the job.
public static void main(String args[]) throws IOException {
URL url = new URL("<SITE>/api/v1/upload-files");
HttpURLConnection c = (HttpURLConnection) url.openConnection();
c.setDoOutput(true);
c.setRequestMethod("POST");
c.setRequestProperty("X-Angie-AuthApiToken", "<TOKEN>");
File file = new File("/1.png");
FileBody fileBody = new FileBody(file, "image/png");
MultipartEntity multipartEntity = new MultipartEntity(HttpMultipartMode.STRICT);
multipartEntity.addPart("file", fileBody);
c.setRequestProperty("Content-Type", multipartEntity.getContentType().getValue());
OutputStream out = c.getOutputStream();
multipartEntity.writeTo(out);
out.close();
BufferedReader buf = new BufferedReader(new InputStreamReader(c.getInputStream()));
StringBuffer response = new StringBuffer();
String line;
while (null != (line = buf.readLine())) {
response.append(line);
}
JSONArray message = new JSONArray(response.toString());
System.out.println(message);
}
Actually it works for me, but if someone can give me ideas about how improve it would be great!
Hi i have been trying to search for bugs in bugzilla through rest api methods of bugzilla. To get the bugs I developed code in java which is giving 406 error. Below is my code.
public static void main(String[] args) throws IOException, JsonParser.ParseException,
JSONException, ParseException {
URL url=new URL("http:mybugzilla.com/bug");
HttpURLConnection urlConnection= (HttpURLConnection) url.openConnection();
urlConnection.setRequestProperty("Accept","application/json");
urlConnection.setRequestMethod("GET");
urlConnection.setDoOutput(true);
urlConnection.connect();
PrintStream printStream=new PrintStream(urlConnection.getOutputStream());
//printStream.print();
BufferedReader br = new BufferedReader(new
InputStreamReader(urlConnection.getInputStream()));
String line;
StringBuilder sb=new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line).append("\n");
}
System.out.println(sb);
}
According to this Api doc you are probably missing something in your request string.
At the very least your URL url=new URL("http:mybugzilla.com/bug") should be
URL url=new URL("http://mybugzilla.com/bug")
A sample example written in python, to get the list of public bugs from Bugzilla 5.x, using the rest API.
import requests
url_bz_restapi = 'http://localhost/bugzilla/rest.cgi/bug'
r = requests.get(url_bz_restapi)
I'm building a web crawler. Having read this I understand that DNS resolution is slow and so we should separate out the DNS Resolver.
So say that you have
String urlString http://google.com
you can then convert that into an ip by doing
URL url = new URL(urlString)
InetAddress ip = InetAddress.getByName(url.getHost());
But then how do you download the actual website itself?
With the url, we could just dow something like this:
String htmlDocumentString = new Scanner(new url.openStream(), "UTF-8").useDelimiter("\\A").next();
But if we want to used the resolved IP, do we have to manually reconstruct the URL with an ip? There is no url.setHost() method, it just seems kinda messy?
Reading from URL is simple :
public class URLReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
Taken from : http://docs.oracle.com/javase/tutorial/networking/urls/readingURL.html
Try this instead:
URL oracle = new URL("http://www.oracle.com/");
URLConnection urlc = oracle.openConnection();
urlc.setDoInput(true);
urlc.setRequestProperty("Accept", "text/text");
InputStream inputStream = urlc.getInputStream();
String myString = IOUtils.toString(inputStream, "UTF-8");
... using IOUtils from Apache Commons above:
http://commons.apache.org/io/api-1.4/org/apache/commons/io/IOUtils.html#toString(java.io.InputStream,%20java.lang.String)
public java.lang.StringBuffer getRequestURL()
I am using this method to call the API of another website which gives XML data as response to it . Is this the right method to be used with HTTPrequest/response. ?
No. You should use new URL(url).openConnection(), or some abstraction like http components or a rest-client
If you want to make HTTP requests from within a Servlet you do it as you would from any process. Something like this:
public static void main(String[] args) throws Exception {
URL url = new URL("http://www.targetdomain.com/api?key1=value1&key2=value2...");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setConnectTimeout(5000); // 5 seconds
conn.setRequestMethod("GET");
conn.connect();
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String line;
StringBuffer bf = new StringBuffer();
while ((line = rd.readLine()) != null) {
bf.append(line);
}
conn.disconnect();
//... pass bf to an XML parser and do your processing...
}
Depending on whatever XML parser you're using, you can probably skip buffering the response and putting it in a StringBuffer, and instead pass your parser the response InputStream directly.
This question already has answers here:
How to use java.net.URLConnection to fire and handle HTTP requests
(12 answers)
Closed 9 years ago.
How do I do a HTTP GET in Java?
If you want to stream any webpage, you can use the method below.
import java.io.*;
import java.net.*;
public class c {
public static String getHTML(String urlToRead) throws Exception {
StringBuilder result = new StringBuilder();
URL url = new URL(urlToRead);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(conn.getInputStream()))) {
for (String line; (line = reader.readLine()) != null; ) {
result.append(line);
}
}
return result.toString();
}
public static void main(String[] args) throws Exception
{
System.out.println(getHTML(args[0]));
}
}
Technically you could do it with a straight TCP socket. I wouldn't recommend it however. I would highly recommend you use Apache HttpClient instead. In its simplest form:
GetMethod get = new GetMethod("http://httpcomponents.apache.org");
// execute method and handle any error responses.
...
InputStream in = get.getResponseBodyAsStream();
// Process the data from the input stream.
get.releaseConnection();
and here is a more complete example.
If you dont want to use external libraries, you can use URL and URLConnection classes from standard Java API.
An example looks like this:
String urlString = "http://wherever.com/someAction?param1=value1¶m2=value2....";
URL url = new URL(urlString);
URLConnection conn = url.openConnection();
InputStream is = conn.getInputStream();
// Do what you want with that stream
The simplest way that doesn't require third party libraries it to create a URL object and then call either openConnection or openStream on it. Note that this is a pretty basic API, so you won't have a lot of control over the headers.