web application design problem related to search engine optimization

web application design problem related to search engine optimization - java

We have a website which has a decent customer base. We recently started a subdomain which will be used to serve specific content from the website. We need to redirect the users to subdomain when he/she tries to access the designated content from the old domain. For example I have old domain www.marketplace.com and a subdomain www.paper.marketplace.com. The web application serving both the domains is same. So when the user tries access a URL 'www.marketplace.com\paper\viewarticle' he should be redirected to 'www.paper.marketplace.com\paper\viewarticle'. Since it's the same web application serving both the domains I wanted do this using a servlet. The servlet should redirect user to subdomain based on certain configuration. I've thought about using a properties file in each folder that has a flag that determines if the request accessing the .html/.jsp files should be redirected or not.
the .jsp/.html files can be added/removed to deployment at runtime which is also a key for choosing this design.
Please comments on this approach or suggest any other ideas if you think it's better.
Thanks.

you can do this redirect, as long as the redirect follows some sort of convention, using apache rewrite rules. This way the overhead on your server to parse request, redirect and parse request again is reduced to just parsing the CORRECT request. This will for sure improve the performance on your site by reducing the number of requests.
Rewrites can be done with other servers, not just apache. Apache is just the most documented. Read Apache Rewrite Guide for more info.

Related

How can I connect to Alfresco documents through custom web application

Our java web application uses Alfresco as DMS. The application uses one single systemuser to connect to Alfresco. The application manages the access rights itself with some Business Logic.
Now what I'd like to accomplish, is to be able to use the MS Office URIs to do online editing of Word documents that live in Alfresco. So that's for example an URL that looks like ms-word:ofe|u|https://ourwebapp.com/documents/mydocument.docx
However if we open our documents like this, the user would end up being able to do stuff on Alfresco that we don't want them to do.
Because we want to keep our documents safe and secure, we don't want the users to be able to get the Alfresco documents "directly", but through our app. Opening Alfresco documents directly would mean that each individual user should get a unique Alfresco username/password and we don't have that and we don't want that because we already have lots and lots of documents living in Alfresco.
Surely there are other companies running into this problem? I.e. using their DMS with one single system user?
What I've already tried is to make REST endpoint. A Spring Filter ensures that an authorisation header with username/password is added and the request is forwarded to Alfresco. Then the response from Alfresco is passed back to the user. However this results in a document that's opened in read-only modus at best. Further more, it doesn't seem very secure to set up a connection with the user, using this system user credentials. For all I know, the user will be able to do stuff in Alfresco he isn't supposed to do. Like editing or even viewing other documents. A little bit like this:
There's very little documentation on how the ms-word protocol exactly works, maybe you can point me in the right direction? Or suggest some workarounds I might try out?

For this to work using sharepoint protocol (SPP) you woud have to reimplement the whole protocol server in front of your application since you control the access. There is no free or even available SPP implementation I know of you can (re)use for this.
The Alfresco protocol server may not be an option since you can't / want mirror access control from your app into alfresco. If you get access to a system like Alfresco or Sharepoint using file protocol you will get too much access rights as you already described. By following a concept of an application user you may be locked out from Alfresco concepts for end users if you can't mirror the access logic into alfresco.
Years ago we implemented a dynamic low level access voter to up- or downgrade access inside Alfresco's node service to allow specific permissions based on types and metadata. The same way someone could implement an interface to another system to delegate permission checks based on external data but this would slow down all the systems involved dramatically.
We have a similar requirement since we access documents and data from several enterprise sources including Alfresco from our own business process product having a rule and process based access concept based on cases, processes the documents are involved in- not on folders or document's static ACLs. We use a local service installed on the client partnering with the browser app for downloading, opening and saving back documents after closing the file from a local temporay (checked out) path. Our local client has no idea from Alfresco and is authenticated only against our services using JSON Web Tokens.
So my answer is more a concept not a ready to go solution in the hope to be helpful.

Exposing a web site through web services

I know what I am asking is somehow weird. There is a web application (which we don't have access to its source code), and we want to expose a few of its features as web services.
I was thinking to use something like Selenium WebDriver, so I simulate web clicks on the application according to the web service request.
I want to know whether this is a better solution or pattern to do this.
I shall mention that the application is written using Java, Spring MVC (it is not SPA) and Spring Security. And there is a CAS server providing SSO.

There are multiple ways to implement it. In my opinion Selenium/PhantomJS is not the best option as if the web is properly designed, you can interact with it only using the provided HTML or even some API rather than needing all the CSS, and execute the javascript async requests. As your page is not SPA it's quite likely that an "API" already exists in form of GET/POST requests and you might be lucky enough that there's no CSRF protection.
First of all, you need to solve the authentication against the CAS. There are multiple types of authentication in oAuth, but you should get an API token that enables you access to the application. This token should be added in form of HTTP Header or Cookie in every single request. Ideally this token shouldn't expire, otherwise you'll need to implement a re-authentication logic in your app.
Once the authentication part is resolved, you'll need quite a lot of patience, open the target website with the web inspector of your preferred web browser and go to the Network panel and execute the actions that you want to run programmatically. There you'll find your request with all the headers and content and the response.
That's what you need to code. There are plenty of libraries to achieve that in Java. You can have a look at Jsop if you need to parse HTML, but to run plain GET/POST requests, go for RestTemplate (in Spring) or JAX-RS/Jersey 2 Client.
You might consider implementing a cache layer to increase performance if the result of the query is maintained over the time, or you can assume that in, let's say 5 minutes, the response will be the same to the same query.
You can create your app in your favourite language/framework. I'd recommend to start with SpringBoot + MVC + DevTools. That'd contain all you need + Jsoup if you need to parse some HTML. Later on you can add the cache provider if needed.

We do something similar to access web banking on behalf of a user, scrape his account data and obtain a credit score. In most cases, we have managed to reverse-engineer mobile apps and sniff traffic to use undocumented APIs. In others, we have to fall back to web scraping.
You can have two other types of applications to scrape:
Data is essentially the same for any user, like product listings in Amazon
Data is specific to each user, like in a banking app.
In the firs case, you could have your scraper running and populating a local database and use your local data to provide the web service. In the later case, you cannot do that and you need to scrape the site on user's request.
I understand from your explanation that you are in this later case.
When web scraping you can find really difficult web apps:
Some may require you to send data from previous requests to the next
Others render most data on the client with JavaScript
If any of these two is your case, Selenium will make your implementation easier though not performant.
Implementing the first without selenium will require you to do lots of trial an error to get the thing working because you will be simulating the requests and you will need to know what data is expected from the client. Whereas if you use selenium you will be executing the same interactions that you do with the browser and hence sending the expected data.
Implementing the second case requires your scraper to support JavaScript. AFAIK best support is provided by selenium. HtmlUnit claims to provide fair support, and I think JSoup provides no support to JavaScript.
Finally, if your solution takes too much time you can mitigate the problem providing your web service with a notification mechanism, similar to Webhooks or Resthooks:
A client of your web service would make a request for data providing a URI they would like to get notified when the results are ready.
Your service would respond immediatly with an id of the request and start scraping the necessary info in the background.
If you use skinny payload model, when the scraping is done, you store the response in your data store with an id identifying the original request. This response will be exposed as a resource.
You would execute an HTTPPOST on the URI provided by the client. In the body of the request you would add the URI of the response resource.
The client can now GET the response resource and because the request and response have the same id, the client can correlate both.

Selenium isn't a best way to consume webservices. Selenium is preferably an automation tool largely used for testing the applications.
Assuming the services are already developed, the first thing we need to do is authenticate user request.
This can be done by adding a HttpHeader with key as "Authorization" and value as "Basic "+ Base64Encode(username+":"+password)
If the user is valid (Users login credentials match with credentials in server) then generate a unique token, store the token in server by mapping with the user Id and
set the same token in the response header or create a cookie containing token.
By doing this we can avoid validating credentials for the following requests form the same user by just looking for the token in the response header or cookie.
If the services are designed to chcek login every time the "Authorization" header needs to be set in request every time when the request is made.

I think it is a lot of overhead using a webdriver but it depends on what you really want to achieve. With the info you provided I would rather go with a restTemplate implementation sending the appropriate http messages to the existing webapp, wrap it with a nice #service layer and build your web service (rest or soap) on top of it.
The authentication is a matter of configuration, you can pack this in a microservice with #EnableOAuth2Sso and your restTemplate bean, thanks to spring boot, will handle the underlining auth part for you.

May be overkill..... But RPA? http://windowsitpro.com/scripting/review-automation-anywhere-enterprise

Embedding a website into another

The first part of the requirement is to be able to embed our website (say www.mysite.com) into our partners' websites (say www.partner1.com).
Second part is the customer visiting our pages from the partners' website should be able to bookmark our pages (Bookmarked link www.partner1.com/page/sample.jsp should take the user to www.mysite.com/page/sample.jsp internally). Customer should see www.partner1.com/page/sample.jsp in the address bar and should not know that the page is being served by www.mysite.com.
Hence iFrame is out of question. Is there any better way of doing so? Is there a product that supports this kind of embedding?
We eventually want to embed our site with multiple partners and provide the same bookmarking feature.

What you're looking for is a reverse proxy. This is a web server component running on your partner's site, that fetches pages from your server and delivers them to the end customer. Apache defines a reverse proxy as:
A reverse proxy (or gateway), by contrast, appears to the client just like an ordinary web server. No special configuration on the client is necessary. The client makes ordinary requests for content in the name-space of the reverse proxy. The reverse proxy then decides where to send those requests, and returns the content as if it was itself the origin.
This in turn is more of a system administration question than a programming question. I don't know what web server your partner is using, but the relevant Apache documentation is here. If you want secrecy, check the web server headers carefully to ensure information doesn't leak out.

How to programmatically verify login credentials for a web form?

I'm building an app to let users export data from a university system. Currently, they can log in and see the data in HTML, but I would like to let people download it as CSV.
I have an app where users supply their username and password. I would like to log in to the university system and HTML scrape the resulting page. How can I do this?
I'm building a GWT app. I could either do this in Java-transliterated-JS on the client, or Java on the server.
Update: Selenium might be nice, but it looks like overkill.

You're going to have to do this from the server unless the domains are the same. You'd need to determine what the POST transaction used by the other server for the login step looks like - parameter names etc. Then you'd perform that operation and do whatever you want with what comes back. If you need to see multiple pages, you need to maintain the appropriate session cookie too so that the server knows you're still logged in on the subsequent HTTP requests.

If you have to hit another site to validate the credentials, then I'm not so sure that people should feel comfortable providing those credentials to you. That is, if you don't have rights to check the credentials directly, why are you trustworthy to receive them? I know sometimes people need to integrate with a system they don't own, so this is just a question.
First, this has to be done server-side because of the limitations on client scripting due to the same origin policy.
The typical way of handling the "screen scraping" you mention is to treat the web page as if it was an XML service. First, examine the source code of the page, then using an internet/HTTP stack, craft a POST to the correct URL and read the response using a standard XML library. It will take some ingenuity to come up with a good way to dig into the XML to find the piece you need that will be as insulated as possible from changes to the page. Keep in mind that your system can break any time that the owners of the site change their page.
Sometimes, you can't just send the POST but have to request the blank page initially in order to get hidden form values that need to be returned in the POST. You'll have to experiment to find out what it requires.
Additionally, you probably have to handle cookies as well, since they usually are an integral part of the web site's authentication and session management (though you might get lucky that the session doesn't matter between the initial POST and the first response).
Last, you may be unlucky enough that the site uses javascript to do part of the authentication work, which may require additional digging to understand how the credentials are posted to the site.
There are other potential barriers such as the site checking to see that the referrer is their own site, possible use of SSL (HTTPS) and so on.

I'm pretty sure that the protection against cross-site scripting in web browsers will mean that you can't log in to the university's app using javascript running in the web browser. So the part of your program that fetches data from the university will need to run on your server. Once you have the data, you can process it either on your server or in javascript in the browser, but I think it would be easier to do it on the server.
See http://en.wikipedia.org/wiki/Same_origin_policy

I'm not too sure about GWT, but in general, you would take the form data submitted by the user, check it against a database of username and hashed passwords. If the database checks out, set a session cookie that says the user is logged in.
In your pages, check if the session cookie say the user is logged in. If not, redirect to login page, otherwise allow them to view the pagfe.

Single Sign On without cookies in Java

I keep on facing this question from my manager how SSO will work if client disable cookies but I don't have any answer. We are currently using JOSSO for single sign on. Do we have any open source framework which support single sign on without using cooking mechanism.

In the absence of cookies, you're going to have to embed some parameter in each url request. e.g. after logging in you assign some arbitrary id to a user and embed that in every link such as http://mydomain.com/main?sessionid=123422234235235. It could get pretty messy since every link would have to be fixed up before it went out the door which slows down your content. It also has security, logging and session history implications which are not such a huge deal when the state is in a cookie.
It may be simpler to do a simple cookie test on logged in users and send them off to an error page if they do not have cookies enabled.

The CAS project passes a "ticket" from the sign on server to the consuming application as a url query parameter, the consuming app then makes a back channel request back to the sign on server to validate the ticket's authenticity. This negates the need for cookies and therefore works across domains however it is a bit "chatty"
Another arguably more robust solution is to use a product based on SAML which is an industry standard for cross domain single sign on. There are a couple of open source products out there which use SAML and CAS itself has a SAML extension however they are typically quite complex to setup. Cloudseal is also based on SAML and is much simpler to use. The Cloudseal platform itself is delivered as a managed service but all the client libraries are open source
Of course with all these solutions you are simply passing a security context from one server to another, the consuming application will no doubt create it's own local session so you would then need to use URL rewriting instead of cookies
Disclaimer: I work for Cloudseal :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.