Logging REST traffic in Java EE (or HTTP in general)

Logging REST traffic in Java EE (or HTTP in general) - java

I had a look around for a Java solution to logging REST or HTTP calls in general in a Java EE environment to a database for later analysis but I haven't been successful so far.
My requirements are logging the request and response of course, including some meta data like the ip of the requesting side (client/server), the currently authenticated user and the status code. This should happen in way that you could search for this meta data in a efficient way so that there is not much stress on the database and you could build a log browser on top of it which can present you with results without being unresponsive even if there were several million log entries.
A nice to have would be that the log viewer is already there and does not have to be built.
Do I have to implement this myself or is there any solution that I could throw in?

Related

How can I connect to Alfresco documents through custom web application

Our java web application uses Alfresco as DMS. The application uses one single systemuser to connect to Alfresco. The application manages the access rights itself with some Business Logic.
Now what I'd like to accomplish, is to be able to use the MS Office URIs to do online editing of Word documents that live in Alfresco. So that's for example an URL that looks like ms-word:ofe|u|https://ourwebapp.com/documents/mydocument.docx
However if we open our documents like this, the user would end up being able to do stuff on Alfresco that we don't want them to do.
Because we want to keep our documents safe and secure, we don't want the users to be able to get the Alfresco documents "directly", but through our app. Opening Alfresco documents directly would mean that each individual user should get a unique Alfresco username/password and we don't have that and we don't want that because we already have lots and lots of documents living in Alfresco.
Surely there are other companies running into this problem? I.e. using their DMS with one single system user?
What I've already tried is to make REST endpoint. A Spring Filter ensures that an authorisation header with username/password is added and the request is forwarded to Alfresco. Then the response from Alfresco is passed back to the user. However this results in a document that's opened in read-only modus at best. Further more, it doesn't seem very secure to set up a connection with the user, using this system user credentials. For all I know, the user will be able to do stuff in Alfresco he isn't supposed to do. Like editing or even viewing other documents. A little bit like this:
There's very little documentation on how the ms-word protocol exactly works, maybe you can point me in the right direction? Or suggest some workarounds I might try out?

For this to work using sharepoint protocol (SPP) you woud have to reimplement the whole protocol server in front of your application since you control the access. There is no free or even available SPP implementation I know of you can (re)use for this.
The Alfresco protocol server may not be an option since you can't / want mirror access control from your app into alfresco. If you get access to a system like Alfresco or Sharepoint using file protocol you will get too much access rights as you already described. By following a concept of an application user you may be locked out from Alfresco concepts for end users if you can't mirror the access logic into alfresco.
Years ago we implemented a dynamic low level access voter to up- or downgrade access inside Alfresco's node service to allow specific permissions based on types and metadata. The same way someone could implement an interface to another system to delegate permission checks based on external data but this would slow down all the systems involved dramatically.
We have a similar requirement since we access documents and data from several enterprise sources including Alfresco from our own business process product having a rule and process based access concept based on cases, processes the documents are involved in- not on folders or document's static ACLs. We use a local service installed on the client partnering with the browser app for downloading, opening and saving back documents after closing the file from a local temporay (checked out) path. Our local client has no idea from Alfresco and is authenticated only against our services using JSON Web Tokens.
So my answer is more a concept not a ready to go solution in the hope to be helpful.

Exposing a web site through web services

I know what I am asking is somehow weird. There is a web application (which we don't have access to its source code), and we want to expose a few of its features as web services.
I was thinking to use something like Selenium WebDriver, so I simulate web clicks on the application according to the web service request.
I want to know whether this is a better solution or pattern to do this.
I shall mention that the application is written using Java, Spring MVC (it is not SPA) and Spring Security. And there is a CAS server providing SSO.

There are multiple ways to implement it. In my opinion Selenium/PhantomJS is not the best option as if the web is properly designed, you can interact with it only using the provided HTML or even some API rather than needing all the CSS, and execute the javascript async requests. As your page is not SPA it's quite likely that an "API" already exists in form of GET/POST requests and you might be lucky enough that there's no CSRF protection.
First of all, you need to solve the authentication against the CAS. There are multiple types of authentication in oAuth, but you should get an API token that enables you access to the application. This token should be added in form of HTTP Header or Cookie in every single request. Ideally this token shouldn't expire, otherwise you'll need to implement a re-authentication logic in your app.
Once the authentication part is resolved, you'll need quite a lot of patience, open the target website with the web inspector of your preferred web browser and go to the Network panel and execute the actions that you want to run programmatically. There you'll find your request with all the headers and content and the response.
That's what you need to code. There are plenty of libraries to achieve that in Java. You can have a look at Jsop if you need to parse HTML, but to run plain GET/POST requests, go for RestTemplate (in Spring) or JAX-RS/Jersey 2 Client.
You might consider implementing a cache layer to increase performance if the result of the query is maintained over the time, or you can assume that in, let's say 5 minutes, the response will be the same to the same query.
You can create your app in your favourite language/framework. I'd recommend to start with SpringBoot + MVC + DevTools. That'd contain all you need + Jsoup if you need to parse some HTML. Later on you can add the cache provider if needed.

We do something similar to access web banking on behalf of a user, scrape his account data and obtain a credit score. In most cases, we have managed to reverse-engineer mobile apps and sniff traffic to use undocumented APIs. In others, we have to fall back to web scraping.
You can have two other types of applications to scrape:
Data is essentially the same for any user, like product listings in Amazon
Data is specific to each user, like in a banking app.
In the firs case, you could have your scraper running and populating a local database and use your local data to provide the web service. In the later case, you cannot do that and you need to scrape the site on user's request.
I understand from your explanation that you are in this later case.
When web scraping you can find really difficult web apps:
Some may require you to send data from previous requests to the next
Others render most data on the client with JavaScript
If any of these two is your case, Selenium will make your implementation easier though not performant.
Implementing the first without selenium will require you to do lots of trial an error to get the thing working because you will be simulating the requests and you will need to know what data is expected from the client. Whereas if you use selenium you will be executing the same interactions that you do with the browser and hence sending the expected data.
Implementing the second case requires your scraper to support JavaScript. AFAIK best support is provided by selenium. HtmlUnit claims to provide fair support, and I think JSoup provides no support to JavaScript.
Finally, if your solution takes too much time you can mitigate the problem providing your web service with a notification mechanism, similar to Webhooks or Resthooks:
A client of your web service would make a request for data providing a URI they would like to get notified when the results are ready.
Your service would respond immediatly with an id of the request and start scraping the necessary info in the background.
If you use skinny payload model, when the scraping is done, you store the response in your data store with an id identifying the original request. This response will be exposed as a resource.
You would execute an HTTPPOST on the URI provided by the client. In the body of the request you would add the URI of the response resource.
The client can now GET the response resource and because the request and response have the same id, the client can correlate both.

Selenium isn't a best way to consume webservices. Selenium is preferably an automation tool largely used for testing the applications.
Assuming the services are already developed, the first thing we need to do is authenticate user request.
This can be done by adding a HttpHeader with key as "Authorization" and value as "Basic "+ Base64Encode(username+":"+password)
If the user is valid (Users login credentials match with credentials in server) then generate a unique token, store the token in server by mapping with the user Id and
set the same token in the response header or create a cookie containing token.
By doing this we can avoid validating credentials for the following requests form the same user by just looking for the token in the response header or cookie.
If the services are designed to chcek login every time the "Authorization" header needs to be set in request every time when the request is made.

I think it is a lot of overhead using a webdriver but it depends on what you really want to achieve. With the info you provided I would rather go with a restTemplate implementation sending the appropriate http messages to the existing webapp, wrap it with a nice #service layer and build your web service (rest or soap) on top of it.
The authentication is a matter of configuration, you can pack this in a microservice with #EnableOAuth2Sso and your restTemplate bean, thanks to spring boot, will handle the underlining auth part for you.

May be overkill..... But RPA? http://windowsitpro.com/scripting/review-automation-anywhere-enterprise

how to log stack trace aws elastic beanstalk to quickly troubleshoot customer error

Maybe I'm going about this the wrong way or not asking the Google the right way. So I apologize if this is a duplicate. I want to quickly/efficiently be able to troubleshoot client/user errors in my java webapp.
That said, we can easily see where our errors are on a local machine by doing a e.printstacktrace. I want to know the best way to do this on an EC2/Elastic Beanstalk instance. If a customer gets an error I want them to be able to email asking what happened on "ticket number" or "error on this date around this time". My thought was to, when an exception happens, log the stacktrace in my DB along with the username and return the db.uniqueKey to the user so they can email me inquiring about that uniqueKey/ID number.
I don't know if this is a good idea to log substrings of a stacktrace to a db (have all those db connections open/close to log stacktraces).
When I look up this topic, I get a lot about AWS CloudWatch, but this looks more like a tool to log everything that's happening in my environment and server.. That seems like a recipe for scrolling log files forever.
Is logging a stacktrace.substring to a DB with a username a good idea? Is there a better way to get back to my users in a timely fashion on what might have happend?

You can make an API call to CloudWatch Logs to send a single log message, instead of sending your entire Tomcat log files to CloudWatch. You can also send logs in JSON format so that you can include metadata like your user ID, unique key, etc. See the AWSLogsClient class in the SDK.
Note that there are also third party services like Loggly and LogEntries that are really good at this sort of thing, and in my opinion are much more user-friendly than CloudWatch Logs is currently. These service also provide APIs to send individual log messages from Java if you don't want to stream your entire log files to them.
I would recommend using a service designed for logging like the ones I've listed, instead of a database. These services will scale well without the need to worry about things like the number of open DB connections. These services will also handle sending alerts based on the contents of log messages, and provide a nice web interface for viewing the log messages.
If you do decide to go with a database approach I would recommend looking into DynamoDB for this instead of a relational database like MySQL or PostgreSQL. DynamoDB will scale easily, stores documents easily as JSON data, and can trigger Lambda functions which could be used to do things like sending alerts.

How to programmatically verify login credentials for a web form?

I'm building an app to let users export data from a university system. Currently, they can log in and see the data in HTML, but I would like to let people download it as CSV.
I have an app where users supply their username and password. I would like to log in to the university system and HTML scrape the resulting page. How can I do this?
I'm building a GWT app. I could either do this in Java-transliterated-JS on the client, or Java on the server.
Update: Selenium might be nice, but it looks like overkill.

You're going to have to do this from the server unless the domains are the same. You'd need to determine what the POST transaction used by the other server for the login step looks like - parameter names etc. Then you'd perform that operation and do whatever you want with what comes back. If you need to see multiple pages, you need to maintain the appropriate session cookie too so that the server knows you're still logged in on the subsequent HTTP requests.

If you have to hit another site to validate the credentials, then I'm not so sure that people should feel comfortable providing those credentials to you. That is, if you don't have rights to check the credentials directly, why are you trustworthy to receive them? I know sometimes people need to integrate with a system they don't own, so this is just a question.
First, this has to be done server-side because of the limitations on client scripting due to the same origin policy.
The typical way of handling the "screen scraping" you mention is to treat the web page as if it was an XML service. First, examine the source code of the page, then using an internet/HTTP stack, craft a POST to the correct URL and read the response using a standard XML library. It will take some ingenuity to come up with a good way to dig into the XML to find the piece you need that will be as insulated as possible from changes to the page. Keep in mind that your system can break any time that the owners of the site change their page.
Sometimes, you can't just send the POST but have to request the blank page initially in order to get hidden form values that need to be returned in the POST. You'll have to experiment to find out what it requires.
Additionally, you probably have to handle cookies as well, since they usually are an integral part of the web site's authentication and session management (though you might get lucky that the session doesn't matter between the initial POST and the first response).
Last, you may be unlucky enough that the site uses javascript to do part of the authentication work, which may require additional digging to understand how the credentials are posted to the site.
There are other potential barriers such as the site checking to see that the referrer is their own site, possible use of SSL (HTTPS) and so on.

I'm pretty sure that the protection against cross-site scripting in web browsers will mean that you can't log in to the university's app using javascript running in the web browser. So the part of your program that fetches data from the university will need to run on your server. Once you have the data, you can process it either on your server or in javascript in the browser, but I think it would be easier to do it on the server.
See http://en.wikipedia.org/wiki/Same_origin_policy

I'm not too sure about GWT, but in general, you would take the form data submitted by the user, check it against a database of username and hashed passwords. If the database checks out, set a session cookie that says the user is logged in.
In your pages, check if the session cookie say the user is logged in. If not, redirect to login page, otherwise allow them to view the pagfe.

Single Sign On without cookies in Java

I keep on facing this question from my manager how SSO will work if client disable cookies but I don't have any answer. We are currently using JOSSO for single sign on. Do we have any open source framework which support single sign on without using cooking mechanism.

In the absence of cookies, you're going to have to embed some parameter in each url request. e.g. after logging in you assign some arbitrary id to a user and embed that in every link such as http://mydomain.com/main?sessionid=123422234235235. It could get pretty messy since every link would have to be fixed up before it went out the door which slows down your content. It also has security, logging and session history implications which are not such a huge deal when the state is in a cookie.
It may be simpler to do a simple cookie test on logged in users and send them off to an error page if they do not have cookies enabled.

The CAS project passes a "ticket" from the sign on server to the consuming application as a url query parameter, the consuming app then makes a back channel request back to the sign on server to validate the ticket's authenticity. This negates the need for cookies and therefore works across domains however it is a bit "chatty"
Another arguably more robust solution is to use a product based on SAML which is an industry standard for cross domain single sign on. There are a couple of open source products out there which use SAML and CAS itself has a SAML extension however they are typically quite complex to setup. Cloudseal is also based on SAML and is much simpler to use. The Cloudseal platform itself is delivered as a managed service but all the client libraries are open source
Of course with all these solutions you are simply passing a security context from one server to another, the consuming application will no doubt create it's own local session so you would then need to use URL rewriting instead of cookies
Disclaimer: I work for Cloudseal :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.