I want to do a request and get a response from dynamic website, if I do this with normal Browser (like chrome) and see source code this show me all texts (no javascripts), but if I try to do wget or HttpClient I get response with javascripts and no texts.
Texts are dynamic, so how I can receive final source code (with texts)?
Please, if is not clear follow this steps:
1 - Go to http://www.stj.jus.br/webstj/processo/Justica/detalhe.asp?numreg=201201911000&pv=010000000000&tp=51
2 - Inspect elements and see source code from detalhe.asp
3 - Open terminal and use wget for get this page
now can you see the difference?
---- EDIT ----
If help, I trying to do this with HttpClient:
private static InputStream getPageSource(String url) {
InputStream inputStream = null;
try {
HttpClient httpclient = new DefaultHttpClient();
HttpResponse response = httpclient.execute(new HttpGet(url));
StatusLine statusLine = response.getStatusLine();
if(statusLine.getStatusCode() == HttpStatus.SC_OK){
ByteArrayOutputStream out = new ByteArrayOutputStream();
response.getEntity().writeTo(out);
out.close();
String responseString = out.toString();
//..more logic
System.out.println(responseString);
inputStream = response.getEntity().getContent();
} else{
//Closes the connection.
response.getEntity().getContent().close();
throw new IOException(statusLine.getReasonPhrase());
}
} catch (Exception e) {
e.printStackTrace();
}
return inputStream;
}
---- EDIT 2 ----
I got make this work putting one field in header: Referer
if I put this line before execute httpclient: get.setHeader("Referer", "http://www.stj.jus.br/webstj/processo/Justica/pagina_lista.asp"); everything works.. so, the problem now is:
How I get this parameter (Referer) from HttpClient automatically?
Wget does not perform the role of a browser in that it is not interpreting and executing the javascript. It just asks for the resource at a particular URL and saves it to file. If you want to load the content as well then you will need to have access to a javascript engine. You may want to look at using Selenium which has a JavascriptExecutor interface.
Sorry about this, my problem here is with security, for security reason REFERER it's must be seted with "http://www.stj.jus.br/webstj/processo/Justica/pagina_lista.asp", so no problem with redirects or anything like this, just security.
Before I couldn't see this so I post the question.
Thanks.
I'm researching a similar issue, and the answer I keep coming across is to try http://htmlunit.sourceforge.net/ It has a javascript engine embedded. Depending on your environment, the disadvantage of Selenium is that it requires a browser be installed for it to interact with.
Related
I have implemented a PerformHttpPostRequest function which is supposed to send a post request contains a JSON type body and get a JSON response via Apache HttpClient.
public static String PerformHttpPostRequest(String url, String requestBody) throws IOException {
CloseableHttpClient client = HttpClients.createDefault();
HttpPost httpPost = new HttpPost(url);
StringEntity entity = new StringEntity(requestBody);
httpPost.setEntity(entity);
httpPost.setHeader("Accept", "application/json");
httpPost.setHeader("Content-type", "application/json");
CloseableHttpResponse response = client.execute(httpPost);
HttpEntity httpEntity = response.getEntity();
InputStream is = httpEntity.getContent();
return (new BufferedReader(new InputStreamReader(is, "UTF-8"))).readLine();
}
The problem is, the code works perfect on developing environment, but when running the war file with a tomcat server but the request is not executed.
I've tried adding several catch blocks such as IOException, Exception and the code doesn't get there.
I've added debug prints which demonstrated that the code stops responding at the client.execute(...) command.
The function is called inside a try block, and after executing the .execute(...) command the code does get to the finally block.
I've already searched for a similar problem and didn't find an answer.
Is it a known issue? Does anyone have any idea of what can cause that? Or how can I fix it?
Hi Talor nice to meet you,
Please try to use HttpURLConnection to solve this issue like so:
Java - sending HTTP parameters via POST method easily
Have a nice day.
el profesor
I have tried with RestTemplate.
RequestObject requestObject = new RequestObject();
requestObject.setKey("abcd");
requestObject.setEndpoint(serviceEndPoint);
RestTemplate restTemplate = new RestTemplate();
HttpEntity<RequestObject> requestBody = new HttpEntity<RequestObject>(
requestObject);
ResponseEntity<RequestObject> result = restTemplate.postForEntity(
serviceEndPoint, requestBody, RequestObject.class);
Its very simple and hassle free, hope it helps
Few things you can try out.
- Try to do ping/curl from that box where you are running tomcat.
- Try to have a test method which make a get request to a server which is always reachable. For ex google.com and print the status. That way you could be able to know that you code is actually working or not in server env.
Hope this helps. :)
If the code doesn't pass beyond client.execute(...) but it does execute the finally block in the calling code, then you can find out what caused the aborted execution by adding this catch block to the try block that contains the finally:
catch(Throwable x) {
x.printStackTrace();
}
Throwable is the superclass for all exception and error classes, so catching a Throwable will catch everything.
I am doing this as part of enhancing a Selenium Webdriver script.
I have tried using httpclient with Java and a lot of other things but I am not able to get anywhere.
Please help!
Ths is the scenario:
After a certain action is performed in a webpage like a button click,
GET/POST methods can be seen in the Developer Tools in Chrome.
I have taken the example of Google here.
What I need here is to collect all the resource names until a certain resource appears (If you open the developer tools in Chrome and navigate to google.com , under the Network tab on the leftmost column you will see tia.png , just an example).
There are two things that should be achieved:
ensure that a certain resource was loaded
make sure the page is completely loaded (all GET / POST methods have been transferred) before any other action is taken.
The httpclient, httpurlconnection only capture one request, but a page sends a lot of requests. How do we capture all of them?
By Using http apache client , you can use something like this :-
To solve your problem, you can get the response code "response.getStatusLine().getStatusCode()" and check for the expected response code and can also play with the outcome of the response.
URI url = new URI("String URI");
DefaultHttpClient client = new DefaultHttpClient();
HttpPost post = new HttpPost(url);
HttpResponse response = HttpClientBuilder.create().build().execute(post);
System.out.println("Response Code : " + response.getStatusLine().getStatusCode());
BufferedReader rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line);
}
System.out.println("Ressponse 1 >>>>"+result.toString());
And , if you need to send parameters in your post request like :-
List<NameValuePair> urlParameters = new ArrayList<NameValuePair>();
urlParameters.add(new BasicNameValuePair("username", "yourusername"));
urlParameters.add(new BasicNameValuePair("password", "yourPassword"));
post.setEntity(new UrlEncodedFormEntity(urlParameters));
This is the answer to my question :
https://kenneth.io/blog/2014/12/28/taking-chrome-devtools-outside-the-browser/
https://github.com/auchenberg/chrome-devtools-app
From the blog :
It’s a standalone app that runs Chrome DevTools in its own process.
It’s powered by node-webkit, and it’s able to run on Windows, Mac and
Linux, completely independently of Chrome.
I wanted to capture all the requests (as seen in the screenshot) between the application and the server so that I could ensure all the critical resources, requests have been transferred to and fro and a page has been completely loaded.
I was unable to do this, perform programmatically what we see in the Chrome Dev tool under the Network tab in the image here.
Kenneth's work let's one achieve this to a degree.
I am having a mess of a time finding up to date information on sending a JSON request to a local server. I keep coming across examples that use deprecated code, and I'd really like to do this with code that isn't.
I can at least say that I now have a working example, and I am not receiving any deprecated messages from NetBeans, but I would like to know if what I've put together is the right way:
public void sendUpdateRequest() {
String updateString =
"{\"jsonrpc\": \"2.0\", \"method\": \"VideoLibrary.Scan\"}" ;
StringEntity entity = new StringEntity(updateString, Consts.UTF_8);
HttpPost httpPost = new HttpPost(getURL()); // http://xbmc:xbmc#10.0.0.151:8080/jsonrpc
entity.setContentType("application/json");
httpPost.setEntity(entity);
try (CloseableHttpClient client = HttpClientBuilder.create().build()) {
HttpResponse response = client.execute(httpPost);
System.out.println(response.getStatusLine()); // move to log
}
catch (IOException e) {
e.printStackTrace(); // move to log
}
}
This is something I'm working on to update XBMC with a JSON HTTP request
Edit
Changed the code to try with resources per the comment -- hopefully this will be useful for someone else dealing with JSON and Java
but I would like to know if what I've put together is the right way:
Yes, you are doing it correctly given the details you've posted.
The StringEntity contains the body of the request. You can set any appropriate headers there. Any other headers can be set directly on the HttpPost object.
As stated in the comments, don't take any chances, close() the CloseableHttpClient in a finally block.
I have a webview, which displays mobile version of site. I've made "switch", which allows user to switch to full version of site, if he wants.
So at first user goes to site's mobile version and after that he may toggle "switch" and site's full version is loaded.
What's happening now: For some sites just changing user-agent is enough and they are loaded as from PC, when "switch" is toggled. But some sites are still able to detect, that I've entered from mobile device and still show me mobile version.
How can I tell ALL sites, that "I" am not mobile device, but a PC?
Something like:
webView.getSettings().iAmPC(true);
P.S: For example: Opera mobile for Android (and Firefox) have this functionality (if I choose "Desktop" in preferences, EVERY site gives me it's full version). Android default browser 2.3.6 - not.
P.P.S: It will be also useful, if you know how to achieve it even not in webview.
Update: It seems that X-WAP-Profile header should be changed, but still haven't found a solution. There is a kind of solution mentioned here but it seems to be unusable in-app.
P.P.P.S: My app has root access, so, solutions, which require root access are also accepted
Tried shouldOverrideUrlLoading with random or empty Accept header - no effect.
Some websites query the navigator object with a script to check for the browser brand/version. Opera overrides that, and you might want, too.
x-wap-profile header is added in FrameLoader of the Webkit. So changing the firmware would be required to remove this header.
What you could do is try to have your own httpclient and fetch the content yourself. This has been tried on this post
Edit: Updated.
DefaultHttpClient client = new DefaultHttpClient();
comments.setWebViewClient(new WebViewClient(){
#Override
public boolean shouldOverrideUrlLoading(WebView view, String url){
String content = "";
try {
content = getUrlContent(url)
} catch (Exception e) {
e.printStackTrace();
}
view.loadDataWithBaseURL("BaseWebUrl", content, "text/html", "utf-8", "");
return true;
}
});
synchronized String getUrlContent(String url) {
// Create client and set specific user-agent string
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
//request.setHeader("User-Agent", sUserAgent);
try {
HttpResponse response = client.execute(request);
// Check if server response is valid
StatusLine status = response.getStatusLine();
if (status.getStatusCode() != HTTP_STATUS_OK) {
return "Error";
}
// Pull content stream from response
HttpEntity entity = response.getEntity();
return EntityUtils.toString(entity)
}
}
I'm building a simple web-scraper and i need to fetch the same page a few hundred times, and there's an attribute in the page that is dynamic and should change at each request. I've built a multithreaded HttpClient based class to process the requests and i'm using an ExecutorService to make a thread pool and run the threads. The problem is that dynamic attribute sometimes doesn't change on each request and i end up getting the same value on like 3 or 4 subsequent threads. I've read alot about HttpClient and i really can't find where this problem comes from. Could it be something about caching, or something like it!?
Update: here is the code executed in each thread:
HttpContext localContext = new BasicHttpContext();
HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params,
HTTP.DEFAULT_CONTENT_CHARSET);
HttpProtocolParams.setUseExpectContinue(params, true);
ClientConnectionManager connman = new ThreadSafeClientConnManager();
DefaultHttpClient httpclient = new DefaultHttpClient(connman, params);
HttpHost proxy = new HttpHost(inc_proxy, Integer.valueOf(inc_port));
httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY,
proxy);
HttpGet httpGet = new HttpGet(url);
httpGet.setHeader("User-Agent",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)");
String iden = null;
int timeoutConnection = 10000;
HttpConnectionParams.setConnectionTimeout(httpGet.getParams(),
timeoutConnection);
try {
HttpResponse response = httpclient.execute(httpGet, localContext);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
String result = convertStreamToString(instream);
// System.out.printf("Resultado\n %s",result +"\n");
instream.close();
iden = StringUtils
.substringBetween(result,
"<input name=\"iden\" value=\"",
"\" type=\"hidden\"/>");
System.out.printf("IDEN:%s\n", iden);
EntityUtils.consume(entity);
}
}
catch (ClientProtocolException e) {
// TODO Auto-generated catch block
System.out.println("Excepção CP");
} catch (IOException e) {
// TODO Auto-generated catch block
System.out.println("Excepção IO");
}
HTTPClient does not use cache by default (when you use DefaultHttpClient class only). It does so, if you use CachingHttpClient which is HttpClient interface decorator enabling caching:
HttpClient client = new CachingHttpClient(new DefaultHttpClient(), cacheConfiguration);
Then, it analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache.
I suspect, that your issue is caused by proxy server standing between your application and remote server.
You can test it easily with curl application; execute some number of requests omitting proxy:
#!/bin/bash
for i in {1..50}
do
echo "*** Performing request number $i"
curl -D - http://yourserveraddress.com -o $i -s
done
And then, execute diff between all downloaded files. All of them should have differences you mentioned. Then, add -x/--proxy <host[:port]> option to curl, execute this script and compare files again. If some responses are the same as others, then you can be sure that this is proxy server issue.
Generally speaking, in order to test whether or not HTTP requests are being made over the wire, you can use a "sniffing" tool that analyzes network traffic, for example:
Fiddler ( http://fiddler2.com/fiddler2/ ) - I would start with this
Wireshark ( http://www.wireshark.org/ ) - more low level
I highly doubt HttpClient is performing caching of any sort (this would imply it needs to store pages in memory or on disk - not one of its capabilities).
While this is not an answer, its a point to ponder: Is it possible that the server (or some proxy in between) is returning you cached content? If you are performing many requests (simultaneously or near simultaneously) for the same content, the server may be returning you cached content because it has decided that the information has not "expired" yet. In fact the HTTP protocol provides caching directives for such functionality. Here is a site that provides a high level overview of the different HTTP caching mechanisms:
http://betterexplained.com/articles/how-to-optimize-your-site-with-http-caching/
I hope this gives you a starting point. If you have already considered these avenues then that's great.
You could try appending some unique dummy parameter to the URL on every request to try to defeat any URL-based caching (in the server, or somewhere along the way). It won't work if caching isn't the problem, or if the server is smart enough to reject requests with unknown parameters, or if the server is caching but only based on parameters it cares about, or if your chosen parameter name collides with a parameter the site actually uses.
If this is the URL you're using
http://www.example.org/index.html
try using
http://www.example.org/index.html?dummy=1
Set dummy to a different value for each request.