I know what I am asking is somehow weird. There is a web application (which we don't have access to its source code), and we want to expose a few of its features as web services.
I was thinking to use something like Selenium WebDriver, so I simulate web clicks on the application according to the web service request.
I want to know whether this is a better solution or pattern to do this.
I shall mention that the application is written using Java, Spring MVC (it is not SPA) and Spring Security. And there is a CAS server providing SSO.
There are multiple ways to implement it. In my opinion Selenium/PhantomJS is not the best option as if the web is properly designed, you can interact with it only using the provided HTML or even some API rather than needing all the CSS, and execute the javascript async requests. As your page is not SPA it's quite likely that an "API" already exists in form of GET/POST requests and you might be lucky enough that there's no CSRF protection.
First of all, you need to solve the authentication against the CAS. There are multiple types of authentication in oAuth, but you should get an API token that enables you access to the application. This token should be added in form of HTTP Header or Cookie in every single request. Ideally this token shouldn't expire, otherwise you'll need to implement a re-authentication logic in your app.
Once the authentication part is resolved, you'll need quite a lot of patience, open the target website with the web inspector of your preferred web browser and go to the Network panel and execute the actions that you want to run programmatically. There you'll find your request with all the headers and content and the response.
That's what you need to code. There are plenty of libraries to achieve that in Java. You can have a look at Jsop if you need to parse HTML, but to run plain GET/POST requests, go for RestTemplate (in Spring) or JAX-RS/Jersey 2 Client.
You might consider implementing a cache layer to increase performance if the result of the query is maintained over the time, or you can assume that in, let's say 5 minutes, the response will be the same to the same query.
You can create your app in your favourite language/framework. I'd recommend to start with SpringBoot + MVC + DevTools. That'd contain all you need + Jsoup if you need to parse some HTML. Later on you can add the cache provider if needed.
We do something similar to access web banking on behalf of a user, scrape his account data and obtain a credit score. In most cases, we have managed to reverse-engineer mobile apps and sniff traffic to use undocumented APIs. In others, we have to fall back to web scraping.
You can have two other types of applications to scrape:
Data is essentially the same for any user, like product listings in Amazon
Data is specific to each user, like in a banking app.
In the firs case, you could have your scraper running and populating a local database and use your local data to provide the web service. In the later case, you cannot do that and you need to scrape the site on user's request.
I understand from your explanation that you are in this later case.
When web scraping you can find really difficult web apps:
Some may require you to send data from previous requests to the next
Others render most data on the client with JavaScript
If any of these two is your case, Selenium will make your implementation easier though not performant.
Implementing the first without selenium will require you to do lots of trial an error to get the thing working because you will be simulating the requests and you will need to know what data is expected from the client. Whereas if you use selenium you will be executing the same interactions that you do with the browser and hence sending the expected data.
Implementing the second case requires your scraper to support JavaScript. AFAIK best support is provided by selenium. HtmlUnit claims to provide fair support, and I think JSoup provides no support to JavaScript.
Finally, if your solution takes too much time you can mitigate the problem providing your web service with a notification mechanism, similar to Webhooks or Resthooks:
A client of your web service would make a request for data providing a URI they would like to get notified when the results are ready.
Your service would respond immediatly with an id of the request and start scraping the necessary info in the background.
If you use skinny payload model, when the scraping is done, you store the response in your data store with an id identifying the original request. This response will be exposed as a resource.
You would execute an HTTPPOST on the URI provided by the client. In the body of the request you would add the URI of the response resource.
The client can now GET the response resource and because the request and response have the same id, the client can correlate both.
Selenium isn't a best way to consume webservices. Selenium is preferably an automation tool largely used for testing the applications.
Assuming the services are already developed, the first thing we need to do is authenticate user request.
This can be done by adding a HttpHeader with key as "Authorization" and value as "Basic "+ Base64Encode(username+":"+password)
If the user is valid (Users login credentials match with credentials in server) then generate a unique token, store the token in server by mapping with the user Id and
set the same token in the response header or create a cookie containing token.
By doing this we can avoid validating credentials for the following requests form the same user by just looking for the token in the response header or cookie.
If the services are designed to chcek login every time the "Authorization" header needs to be set in request every time when the request is made.
I think it is a lot of overhead using a webdriver but it depends on what you really want to achieve. With the info you provided I would rather go with a restTemplate implementation sending the appropriate http messages to the existing webapp, wrap it with a nice #service layer and build your web service (rest or soap) on top of it.
The authentication is a matter of configuration, you can pack this in a microservice with #EnableOAuth2Sso and your restTemplate bean, thanks to spring boot, will handle the underlining auth part for you.
May be overkill..... But RPA? http://windowsitpro.com/scripting/review-automation-anywhere-enterprise
I developed an application in java that needs to consume a set of webservices provided by a provider.
Basically I need to call a first rest webservice to connect to the system, in which I pass username and password in the body of the post rest call.
Without using proxies I can make a second rest call without passing username and password, the webservice provider recognises me.
But when I use a proxy this second call does not work any more, I get an HTTP 401 error.
Notice that the first connect call works well also with the proxy.
Is there something conceptual that I'm missing?
Anyone knows what is happening?
Thank you
If you're making only a single login request and then making follow up requests which do not supply credentials, the service you are consuming probably is using cookies to track a user session.
When you put a proxy between yourself and that service, if the proxy is not forwarding all of your HTTP header information (where the cookie is probably being exchanged with the service), then the service will no longer recognize your session and consider you unauthorized
You may need to better understand how the service is maintaining your session (is it using Set-Cookie headers that it expects you to send back?) And you need to understand if the proxy you are using is forwarding all of your headers to the service
is it possible to create a mini HTTP server that acts as a proxy where i can recieve any requests from a webview and it will pass that request to my http proxy server running inside the app that can then view the raw contents of that request(http headers, bodies etc) and handle it from my own proxy?
I can see that the apahce libraries only contain objects that allow you to create requests and handle responses but not how i can create a mini http server.
Thanks
I don't understand the question fully so here's the question I am going to be answering.
Is it possible to create a HTTP server that allows me to view the source code of a web page.
The answer is: yes.
Since I don't really develop for Android phones, I'm only going to list out what you should do.
So first of all you want to accept a connection from a client. Then you might want to send it back a HTML page containing a form with a website URL field. If you set the method to POST, you will be able to make the URL of any length. Now your server needs to know how to receive the HTTP POST request. I don't really know the HTTP well enough to tell you how the request-response is encoded.
I'm writing a server side app in Java using the HttpCore library.
I have an HttpRequest and I'm trying to get the postdata sent from a form. The problem is- when I use request.getEntity() it returns a null object, even though when I look through HTTPFox on what kind of request I'm sending the post data is clearly there.
What am I doing wrong?
There seems to be some confusion. You are sending requests from a browser to the server. The server is likely using the servlet API. There you handle requests using the doPost(..) method of an HttpServlet. You have an HttpServletRequest from which you can get the parameters - request.getParameter("paramName")
HttpCore on the other hand is used to make requests, not to handle requests. It is used as an http client (in the role of the browser).
I am wondering how I can use google protocol buffers to accept a request and send a response back to a client? I am thinking about writing a servlet which will take a request.
Is the following trail of thought the correct way to implement this:
1. Have a .proto file which is the message definition for the incoming request.
2. Write a servlet which accepts this request, does various tasks like querying database
and then sends a response. Will this response require a separate .proto message definition with all the fields that make up the response?
3. Will the client just invoke the doGet() method of my servlet and pass the request, it should then return a response as a protobuff object?
Any suggestion or idea will be very much appreciated.
Typically you'd want a request message and a response message, yes. You'd also probably want a method name to describe the action - that's certainly how the built-in PB services work.
The client wouldn't invoke doGet() - it would make a request (probably a POST rather than a GET) and your servlet would receive it.
Now, ideally you could have a general "ProtocolBufferServlet" which could service the requests by handing them off to services implementing the appropriate interfaces.
I suggest you look at the documentation for Protocol Buffer services and the Java services generated code for more information. You could implement an RpcChannel which worked over servlets, or get the client to make the HTTP post directly. You'd probably use dependency injection of some kind at the server side to tell the servlet what was implementing the service.
HI,
I have this up and running. I ended up posting a http request as a post to my servlet. I was able to take the request protocol buffer, read the request, do some processing and then send back a response. It was actually really simple once I got it working. We used the 1 .proto file to define the request and response message structure.