I am trying to scrape content of a web page using enlive's html-resource function, but I am getting response 403, because I am not coming from a browser.I guess this can be overridden in Java (found answer here) , but I would like to see a clojure way to handle this issue. Perhaps this can be achieved by providing parameters to html-resource function, but I have not encountered an example of how and what needs to be passed as parameter. Any suggestion will be greatly appreciated.
Thanks.
Enlive's html-resource does not provide a way to override the default request properties. You can, like the other answer you found, open the connection yourself and pass the resulting InputStream to html-resource.
Something like the following would handle it:
(with-open [inputstream (-> (java.net.URL. "http://www.example.com/")
.openConnection
(doto (.setRequestProperty "User-Agent"
"Mozilla/5.0 ..."))
.getContent)]
(html-resource inputstream))
Although, it might look better split out into its own function.
Related
I want to use HttpURLConnection class in Groovy to send GET and POST (with Jsonbody) request to an api. But what can I tell you. With HttpURLConnection it is soo difficult. I do not know how to use it. There is not even a send method. It looks like when you call getResponseCode() this method sends a request. Then you have to use InputStream and for POST you even have to use OutputStream. Oh my god. For what ?? I am used to handy libraries like Jersey Client. But this HttpURLConnection is just a nightmare. I have to use it because I have jenkinsfile and in my pipeline I need to upload something. And that is possible with calling an REST Api. What do you think about HttpURLConnection ? Does someone know a good website with a GET Request and a POST Request with a body.
I think this is what you are trying to do https://www.baeldung.com/httpurlconnection-post
Making it Groovy should be pretty trivial.
If you are doing an HTTP request and you want to use vanilla Java or Groovy, then I would recommend using HttpClient and HttpRequest.Builder; it's somewhat fluid. And you supply a BodyHandler object, which can be used to get the content however you like it (String, JSON Object, whatever).
I am using play framework with java
I would like to ask you a question, if it is possible to get route method not url or path from Http Request, because i need to change some route parameter values to been able to open webpages in different language url. For example:
POST /namai.html controllers.Application.postComment(String lng="ru")
POST /home.html controllers.Application.postComment(String lng)
http://localhost:9000/en/namai.html **I would get bad request call**
http://localhost:9000/ru/namai.html **Page would open but link would be not the same language.**
I can't just replace language prefix, because I only been able to open url's in English language. Any ideas how to get route method from http request? Thanks in advance.
If someone collide with same kind of problem, easiest way to solve it is to use java reflection to call reverse routes class and activate needed method.
I am trying to simulate HTTP requests in Java with the URL class and the HttpURLConnection class, the GET requests are easy to simulate while some POST requests seem harder because most POST request need Cookie in the request header. Some cookies were set by the HTTP response in the Set-Cookie field and I can get them by the function provided by HttpURLConnection, but I found that other cookies may be set by JavaScript and I have no way to handle them, so I wonder is there any packaged tool to simulate HTTP requests in Java?
try Apache commons Httpclient:
http://hc.apache.org/httpclient-3.x/
Why do you need to generate HTTP requests? Do you want to perform some stress tests?
I'd advise using something like JMeter (you can find a brief tutorial here).
Hope this helps something, it's better to avoid reinventing the wheel (if you need something like this, but it wasn't clear for me from your question).
For the cookie set with Javascript, you could try to parse the HTTP response, extract the cookie information and set it for your next request
For example, lets say, the response has code which calls a setCookie() function (setCookie is user-defined javascript function),
...
//some javascript code
setCookie("username", "johndoe");
//some more javascript
...
then you would extract the line setCookie() and the parse it for the name and value
I hava a servlet which handles some resources files, and I need to add a response header before I forward the request to the real jsp file.
response.setHeader("a", "b");
request.getRequestDispatcher("1.jsp").forward(request, response);
I need to send that header directly to the browser, But it did not work, I tried to use firebug to watch the http request and its response, how can I do that?
Try to use .include(request, response) instead. Probably it's a .forward() feature to fully clean response object before forwarding.
See http://download.oracle.com/javaee/5/api/javax/servlet/RequestDispatcher.html
How do you know that it is not working? Please read this JR thread, I believe you are expecting similar thing.
If you want to use some data added by the servlet in the 1.jsp code, I suggest you use request.setAttribute method. response.addHeader/setHeader put some data into the response'header. Generally the data in the response header is used by the browser.
The headers are being cleaned up. Just curious, what stops you from using request.setAttribute()?
I'm trying to read the text from a website using the Java URL input stream:
URL u = new URL(str);
br3 = new BufferedReader(new InputStreamReader(u.openStream()));
while(true)
System.out.println(br3.readLine());
This seems to work fine for most websites, but for some URL shortening services like LinkBee, the object draws a blank. e.g. linkbee.com/FUAKF. I can view the source code using an explorer, however I repeatedly get nulls when I use the above code.
It's because those sites are just redirection services. How are you handling redirects? (a redirect has a Location: header, but no body)
use a http library like commons:httpclient, the method getResponseBodyAsStream follows redirects automatically
Barry is correct.
I just wanted to add that for certain websites there also could be javascript that could redirect you to a different page. Something like this:
<script type="text/javascript">
<!--
window.location = "http://www.google.com/"
//-->
</script>
But in your situation it would be the headers redirecting you based on the fact you are getting nulls back. Just thought you might want to watch out for the javascript thing too.
It's true that it is a redirection service, however I do not require actually following the redirection, I merely need to extract the URL that it redirects to - which can found within the source code of the redirection website itself (which in the given case, is at line 81:
input type='hidden' id='urlholder' value='http://www.megaupload.com/?d=02EBRUTT'
Regardless, I don't think the stream should be giving me a complete blank unless it doesn't read head, only body?