I need to make a webcrawler to gather links and information from a specific website. I also need to use Apache HTTP Client to do, and I've been glancing and looking over the tutorial on the website for a few days and I'm getting nowhere. Right now, I'm trying to figure out how to use apache HTTPClient to grab the HTML so I can parse through it. Frankly, it may be a case of misunderstanding what HTTPClient is supposed to be used for. Anything help would be greatly appreciated.
h-m-m... here it is, but... do not be surprised if you will see not what you see in the browser. As I said you will get what server actually returns by request:
HttpClient client = new HttpClient();
HostConfiguration hostConfig = new HostConfiguration();
hostConfig.setHost("my.site.com", 80, Protocol.getProtocol("http"));
client.setHostConfiguration(hostConfig);
GetMethod getHtmlPageMethod = new GetMethod("/myPage.html");
getHtmlPageMethod.setFollowRedirects(true);
try {
int responseCode = client.executeMethod(getHtmlPageMethod);
System.out.println("Got response code: " + responseCode);
if (200 == responseCode) {
System.out.println("Response code 200 - SUCCESS ... go for response body... ");
String responseBody = getHtmlPageMethod.getResponseBodyAsString();
if (null != responseBody) {
System.out.println("Got body string:" + System.lineSeparator());
System.out.println(responseBody);
} else
{
System.out.println("No response body returned!");
}
}
} catch (Exception e) {
e.printStackTrace();
}
Related
I have following code where I am calling an API which is a PHP built. The code returns json stated as below which I am collecting in a stringBuilder object. Problem is its working on some carriers and on few devices with other carriers / wifi connection its throwing JSONException end of input at character 0 exception, i know this comes when input string is empty, it means stringBuilder object is empty. Problem is i don't have access to the devices on which its throwing these errors.
I am not getting on some device why following code returns empty string and on some its working fine, user has tested on 3G as well as wifi these devices are in other country on different carriers.
HttpClient httpClient = HttpClientBuilder.create().build();
HttpPost postRequest = new HttpPost(ServiceUrls.base_url + ServiceUrls.get_profile_url);
JSONObject object = new JSONObject();
object.put("username", params[0]);
StringEntity input = new StringEntity(object.toString());
input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response = httpClient.execute(postRequest);
if (response.getStatusLine().getStatusCode() != 200) {
throw new RuntimeException("Failed : HTTP error code : "
+ response.getStatusLine().getStatusCode());
}
BufferedReader br = new BufferedReader(
new InputStreamReader((response.getEntity().getContent())));
String output;
StringBuilder stringBuilder = new StringBuilder();
while ((output = br.readLine()) != null) {
stringBuilder.append(output);
}
If it was for all API call then it was logical but doest happen for other API call, this API call returns bigger size JSON string as follows in stringbuilder
{
"status":1,
"parking":{
"name":"ghgjjghghg",
"cost":3,
"ownerId":29,
"address":"xyz pqr",
"slots":4,
"image":"d4bc95c1dd031685746f2c3570788acf.jpg",
"details":"gjhjghjgg",
"amenities":"gjhg",
"id":70,
"lon":73.7898023,
"lat":19.9974533,
"type":0,
"available":1
},
"rating":0,
"ratingCount":0,
"owner":{
"id":29,
"username":"vd#gmail.com",
"password":"",
"fullname":"vi hdjh",
"phone":"23434fddf",
"ccNum":null,
"ccType":null,
"type":1,
"authType":1,
"image":"582e3a77d76ae3203cfd6d6a346da429.jpg",
"dni":"abc123",
"account":"ABCBANK"
}
}
I have no clue whats happening , please help. Any input will be appreciated.
There is nothing unusual about the code you posted. No clues there.
Let me summarize what I think you said about the symptoms.
For some devices / carriers, a specific API call fails. But not all devices / carriers.
For the same devices / carriers as above, other API calls work, all if the time.
The client-side code is identical in all cases, apart from the URLs.
To me, this is pointing at a problem on the server side that is triggered by what the requests look like to it. But either the way, I would try to investigate this by looking at the requests and responses on the server side, and checking the server-side logs. See if there are significant differences in the requests coming from different devices / carriers. Especially the ones that work versus the ones that don't work. And see if the responses are empty when the server sends them.
I found the theory of Leonidos usefull:
https://stackoverflow.com/a/19540249/6076711
And here is my end of solution you can try using the following code.
string output = "";
while(br.ready() && (br.readLine() != null))
{
output += (String.valueOf(br.readLine()))+"\n";
}
The code can be improved by checking (before putting it in the string builder) whether the length of the content is bigger than 0.
HttpClient httpClient = HttpClientBuilder.create().build();
HttpPost postRequest = new HttpPost(ServiceUrls.base_url +
ServiceUrls.get_profile_url);
JSONObject object = new JSONObject();
object.put("username", params[0]);
StringEntity input = new StringEntity(object.toString());
input.setContentType("application/json");
postRequest.setEntity(input);
HttpResponse response = httpClient.execute(postRequest);
String output;
if (response.getStatusLine().getStatusCode() != 200) {
throw new RuntimeException("Failed : HTTP error code : "
+ response.getStatusLine().getStatusCode());
} else {
HttpEntity entity = response.getEntity();
if ((entity != null) && (entity.getContentLength() != 0)) {
// Use writing to output stream to prevent problems with chunked responses
ByteArrayOutputStream os = new ByteArrayOutputStream();
entity.writeTo(os);
output = new String(os.toByteArray(),"UTF-8");
} else {
// Handle case there is not content returned
System.out.println("Received no content (HTTP response code " + response.getStatusLine().getStatusCode() + " , reason: " + getReasonPhrase() +")");
}
}
The code above however doesn't solve the issue why you get an empty response. I its only handling the fact it is happening is a more elegant way.
I noted however that you require a username in the request. Are you sure the user exist on the device and in case of non existing user, should there be returned something else?
I am looking to interact with a Documentum Repository using their REST API. I would like to use the http-client 4.3 jars to perform this interaction.
I was hoping someone might have a sample that would help point me in the correct direction on how to interact with DCTM.
I am having trouble finding a clear and simple example of how to do this.
Thanks
I know it is a bit late to answer this question. But i want to answer to help those who still need a code for making requests to the rest api. Here is a full example of sending a post request to the rest api for starting a workflow.
For other needs you can check the Document called Documentum xCP Rest Services provided by EMC : https://support.emc.com/docu52500_Documentum-xCP-REST-Services-2.1-Development-Guide.pdf?language=en_US&request=akamai and compare with this example, change it according to it's needs.
UPDATE:
Also if you are not using xcp here is the Documentation for rest api without it emc.com/collateral/TechnicalDocument/docu57895.pdf
You can also check my answer here How can I use REST to copy an object in Documentum 7.x for geting object data and content from the rest api ( without xcp )
String strResponse = "";
String process_id = "system_name of the process you want to start";
String url = "Your App Url Here/processes/" + process_id;
String json = "{"+
"\"run-stateless\" : \"false\","+
"\"data\" :"+
" { "+
" \"variables\" : "+
" { \"Variable name\" : \"Variable value\" } "+
" } "+
"}";
CloseableHttpClient httpClient = HttpClientBuilder.create().build();
BufferedReader rd = null;
CloseableHttpResponse cls = null;
try {
HttpPost request = new HttpPost(url);
// set timeouts as you like
RequestConfig config = RequestConfig.custom()
.setSocketTimeout(60 * 1000).setConnectTimeout(20 * 1000)
.setConnectionRequestTimeout(20 * 1000).build();
request.setConfig(config);
StringEntity params = new StringEntity(json);
request.addHeader("Accept", "application/json");
request.addHeader(
"Authorization",
"Basic "
+ com.documentum.xmlconfig.util.Base64
.encode("username here" + ":"
+ "password here"));
request.addHeader("Content-Type", "application/vnd.emc.xcp+json");
request.setEntity(params);
try {
cls = httpClient.execute(request);
HttpEntity entity = cls.getEntity();
rd = new BufferedReader(new InputStreamReader(
entity.getContent()));
String line = "";
while (line != null) {
line = rd.readLine();
strResponse += line;
}
strResponse = strResponse.trim().replace("\n", "");
String statusline = cls.getStatusLine().toString();
if (!statusline.contains("200") && !statusline.contains("201")) {
Log.write("Process is not started");
// log the strResponse or do something with it
} else {
System.out.println("Process started successfully");
}
} catch (Exception e) {
e.printStackTrace();
}
} finally {
// using commons-io-2.4.jar
IOUtils.closeQuietly(httpClient);
IOUtils.closeQuietly(cls);
IOUtils.closeQuietly(rd);
}
I am trying to get the html source of a webpage through java code using Jsoup. Below is the code I am using to fetch the page. I am getting a 500 Internal Server Error.
String encodedUrl = URIUtil.encodePathQuery(urlToFetch.trim(), "ISO-8859-1");
Response res = Jsoup.connect(encodedUrl)
.header("Accept-Language", "en")
.userAgent(userAgent)
.data(data)
.maxBodySize(bodySize)
.ignoreHttpErrors(true)
.ignoreContentType(true)
.timeout(10000)
.execute();
However, when I fetch the same page with wget from command line, it works. A simple HttpClient from code also works.
// Create an instance of HttpClient.
HttpClient client = new HttpClient();
// Create a method instance.
GetMethod method = new GetMethod(url);
// Provide custom retry handler is necessary
method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(3, false));
try {
// Execute the method.
int statusCode = client.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: " + method.getStatusLine());
}
// Read the response body.
byte[] responseBody = method.getResponseBody();
// Deal with the response.
// Use caution: ensure correct character encoding and is not binary data
System.out.println(new String(responseBody));
} catch (HttpException e) {
System.err.println("Fatal protocol violation: " + e.getMessage());
e.printStackTrace();
} catch (IOException e) {
System.err.println("Fatal transport error: " + e.getMessage());
e.printStackTrace();
} finally {
// Release the connection.
method.releaseConnection();
}
Is there anything I would need to change in the parameters for Jsoup.connect() method for it work?
This however does not happen for all urls. It is specifically happening for pages from this website:
http://xyo.net/iphone-app/instagram-RrkBUFE/
You need Accept header.
Try this:
String encodedUrl = "http://xyo.net/iphone-app/instagram-RrkBUFE/";
Response res = Jsoup.connect(encodedUrl)
.header("Accept-Language", "en")
.ignoreHttpErrors(true)
.ignoreContentType(true)
.header("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8")
.followRedirects(true)
.timeout(10000)
.method(Connection.Method.GET)
.execute();
System.out.println(res.parse());
It works.
Please also note that the site is trying to set cookies, you may need to handle them.
Hope it will help.
I have a problem with a WebService on Android. I am getting a 400 error but there is no information on the ErrorStream.
What I am trying to do is a POST request on a WCF Webservice using JSON.
I must add that I have includeExceptionDetailInFaults Enabled on my Service. The last time I got a 400 error, it was because I hadn't defined the RequestProperty. Now I don't get any error in the stream.
Here is the code:
HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
try {
// In my last error I had not included these lines. Maybe they are still wrong?
urlConnection.setRequestProperty("Content-Type", "application/json");
urlConnection.setRequestProperty("Accept", "application/json");
urlConnection.setRequestMethod("POST");
urlConnection.setDoOutput(true);
urlConnection.setChunkedStreamingMode(0);
OutputStream out = new BufferedOutputStream(urlConnection.getOutputStream());
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(out);
outputStreamWriter.write(jsonObject.toString(), 0, jsonObject.length());
outputStreamWriter.flush();
//outputStreamWriter.close();
int code = urlConnection.getResponseCode();
System.out.println(code);
if(code == 400) {
BufferedInputStream errorStream = new BufferedInputStream(urlConnection.getErrorStream());
InputStreamReader errorStreamReader = new InputStreamReader(errorStream);
BufferedReader bufferedReader = new BufferedReader(errorStreamReader);
StringBuilder builder = new StringBuilder();
String aux = "";
while ((aux = bufferedReader.readLine()) != null) {
builder.append(aux);
}
String output = builder.toString(); // The output is empty.
System.out.print(output);
}
Check Retrofit library from Square it's more easy and thin for GET/POST request and especially for REST. I suggest you to try it. It will make your life easy.
You can use different JSON parsers, error handlers, etc. Very flexible.
POST request definition using retrofit it's simple like this:
An object can be specified for use as an HTTP request body with the #Body annotation.
#POST("/users/new")
void createUser(#Body User user, Callback<User> cb);
Methods can also be declared to send form-encoded and multipart data.
Form-encoded data is sent when #FormUrlEncoded is present on the method. Each key-value pair is annotated with #Field containing the name and the object providing the value.
#FormUrlEncoded
#POST("/user/edit")
User updateUser(#Field("first_name") String first, #Field("last_name") String last);
After you define method inside your Java interface like shown above instantiate it:
RestAdapter restAdapter = new RestAdapter.Builder()
.setEndpoint("https://api.soundcloud.com")
.build();
MyInterface service = restAdapter.create(MyInterface.class);
And then you can call your method synchronously or asynchronously (in case you pass Callback instance).
service.myapi(requestBody);
See Retrofit documentation (http://square.github.io/retrofit/javadoc/index.html) and samples on GitHub for more details.
A 400 error might be occuring (and usually occurs in my case) because of incorrect URL or bad JSON format in post. please check those two
Using an HttpPost object will make your job a lot easier in my opinion
HttpPost post = new HttpPost(url);
if(payload != null){
try {
StringEntity entity = new StringEntity(payload,HTTP.UTF_8);
entity.setContentType(contentType);
post.setEntity(entity);
} catch (UnsupportedEncodingException e) {
LOG.d(TAG, "post err url : " + url);
LOG.e(TAG, "post err url" , e);
throw new Exception(1, e);
}
}
HttpResponse response=executeRequest(owner, post);
My Galaxy Nexus arrived today, and one of the first things I did was to load my app onto it so I could demonstrate it to my friends. Part of its functionality involves importing RSS Feeds from Google Reader. However, upon trying this, I was getting 405 Method Not Allowed errors.
This problem is Ice Cream Sandwich-specific. The code I've attached works fine on Gingerbread and Honeycomb. I've traced the error down to the moment the connection is made, when the GET request magically turns into a POST request.
/**
* Get the authentication token from Google
* #param auth The Auth Key generated in getAuth()
* #return The authentication token
*/
private String getToken(String auth) {
final String tokenAddress = "https://www.google.com/reader/api/0/token";
String response = "";
URL tokenUrl;
try {
tokenUrl = new URL(tokenAddress);
HttpURLConnection connection = (HttpURLConnection) tokenUrl.openConnection();
connection.setRequestMethod("GET");
connection.addRequestProperty("Authorization", "GoogleLogin auth=" + auth);
connection.setRequestProperty("Content-Type","application/x-www-form-urlendcoded");
connection.setUseCaches(false);
connection.setDoOutput(true);
Log.d(TAG, "Initial method: " + connection.getRequestMethod()); // Still GET at this point
try {
connection.connect();
Log.d(TAG, "Connected. Method is: " + connection.getRequestMethod()); // Has now turned into POST, causing the 405 error
InputStream in = new BufferedInputStream(connection.getInputStream());
response = convertStreamToString(in);
connection.disconnect();
return response;
}
catch (Exception e) {
Log.d(TAG, "Something bad happened, response code was " + connection.getResponseCode()); // Error 405
Log.d(TAG, "Method was " + connection.getRequestMethod()); // POST again
Log.d(TAG, "Auth string was " + auth);
e.printStackTrace();
connection.disconnect();
return null;
}
}
catch(Exception e) {
// Stuff
Log.d(TAG, "Something bad happened.");
e.printStackTrace();
return null;
}
}
Is there anything that could be causing this problem? Could this function be better coded to avoid this problem?
Many thanks in advance.
This behaviour is described in Android Developers: HttpURLConnection
HttpURLConnection uses the GET method by default. It will use POST if
setDoOutput(true) has been called.
What's strange though is that this has not actually been the behaviour until 4.0, so I would imagine it's going to break many existing published apps.
There is more on this at Android 4.0 turns GET into POST.
Removing this line worked for me:
connection.setDoOutput(true);
4.0 thinks with this line it should definitely be POST.
Get rid of this:
connection.setRequestProperty("Content-Type","application/x-www-form-urlendcoded");
This tells the API this is a POST.
UPDATE on how it could be done via HttpClient:
String response = null;
HttpClient httpclient = null;
try {
HttpGet httpget = new HttpGet(yourUrl);
httpget.setHeader("Authorization", "GoogleLogin auth=" + auth);
httpclient = new DefaultHttpClient();
HttpResponse httpResponse = httpclient.execute(httpget);
final int statusCode = httpResponse.getStatusLine().getStatusCode();
if (statusCode != HttpStatus.SC_OK) {
throw new Exception("Got HTTP " + statusCode
+ " (" + httpResponse.getStatusLine().getReasonPhrase() + ')');
}
response = EntityUtils.toString(httpResponse.getEntity(), HTTP.UTF_8);
} catch (Exception e) {
e.printStackTrace();
// do some error processing here
} finally {
if (httpclient != null) {
httpclient.getConnectionManager().shutdown();
}
}
This is one that got me - basically by setting setDoOutput(true) it forces a POST request when you make the connection, even if you specify this is a GET in the setRequestMethod:
HttpURLConnection uses the GET method by default. It will use POST if
setDoOutput(true) has been called. Other HTTP methods (OPTIONS, HEAD,
PUT, DELETE and TRACE) can be used with setRequestMethod(String).
This caught me a while back - very frustrating ...
See http://developer.android.com/reference/java/net/HttpURLConnection.html and go to HTTP Methods heading
I've found that pre-ICS one could get away with making a body-less POST without providing a Content-Length value, however post-ICS you must set Content-Length: 0.