Google App Engine - Uploading blobs and authentication

Google App Engine - Uploading blobs and authentication - java

(I tried asking this on the GAE forums but didn't get an answer so am trying it here.)
Currently to upload blobs, the app engine's blob store service creates a unique one-
time URL that a user can post blobs to. My requirement is that I only
want authenticated / authorized users to post blobs in my application. I can achieve this currently if the page that includes the multipart form to upload blobs is in my application.
However, I am looking to providing a "REST API" for my users to upload their blobs. While it is true that the one-time nature of the upload URL mitigates the chances of rogue use but it's still possible.
I was wondering if there is anyone on the app engine team here that can consider a feature where developers can register an upload listener. (Or if there is already a way, I'll be all ears). A standard servlet filter could also potentially do the job. This will give us an opportunity to authenticate / validate / decorate requests before the request gets forwarded to the blob store service.
Thanks,
Keyur

Since, as you point out, it's only possible to upload blobs if you have a valid upload URL, you can simply issue valid upload URLs only to authorized users. The only way an unauthorized user could then get an upload URL would be if an authorized user gave it to them, or it was intercepted - and in either case, the same caveat would apply to regular credentials.
In any case, it's still possible to check a user's credentials after the upload, at which point you can immediately delete the blob if you're not satisfied. If it were possible to regularly upload unauthorized blobs, this could lead to a denial of service vulnerability, but due to the restrictions on handing out the encoded URLs I mentioned above, this is only likely to apply if, for example, a user's access was revoked after you generated an upload URL for them.

I'm not sure whether it would work (i.e. GAE might not let you do it), but a servlet filter which wraps the /_ah/upload pattern could first check whether the POST came from same IP address as the authenticated client.

Now, you can upload file with Blobstore API, check out here: http://code.google.com/appengine/docs/java/blobstore/overview.html

Related

How do I hide a Spring Boot "GET" API request from the public in a browser?

As the question states, my goal is to hide a GET route in Spring Boot from being accessed from the public. I originally took a CORS approach, but that doesn't solve the actual view problem. Pretty much anyone could go to, say... https://my-api-url.com/employee/all and see a JSON record of all employees in my database.
END GOAL: I only want my front-end to have access to my API for displaying that information to an authorized user who is signed in, but I do NOT want just anyone to have access to the API. CORS policy can handle the ajax requests, but it doesn't seem like I can stop the overall viewing of the GET url.
How can I solve this problem?

You can use OAuth to register clients(frontend/postman/whatever you are using to test the API) that can access your resource server, but it might be overkill. For now, if you worry someone can view your API by typing it in the address bar(if that is your question) then you can allow access for authenticated users only.

If you want to restrict usage and make it inconvenient for abusers to call your API, you can issue a token on page load (CSRF token) and require that token to be present in the request to the API - that way the API will be callable from a browser that initiated a page load.
You can refer this link https://security.stackexchange.com/questions/246434/how-can-i-ensure-my-api-is-only-called-by-my-client

If your frontend is currently handling authentication, i‘d suggest moving to Springs Authenticationserice. That way you could prevent unauthenticated users from accessing that specific API endpoint.

How to properly use s3 to deliver and store files in a web application?

So we are planning to move static content to s3 for operational reasons. I just want to understand where to place s3 in the workflow of handling a request.
If website requires an image, should the request hit our service first which would fetch the image from s3 (reverse-proxy) or should client directly request the file.
How to hide file names ,pathnames and manage permissions in request for file?
Same questions applicable for uploading new content.
Handle s3 quota and parallel requests

I was going to comment, but this turned into a full answer instead...
Either. If your assets are public, the lowest-weight method is to just request them from a public S3 bucket. If they're not, though, it's probably easiest to use Cloudfront rather than rolling-your-own auth around S3 requests.
You can make it look like your asset A.jpeg in S3.yourBucket/A.jpeg is at yourWebsite.com/A.jpeg using Cloudfront. If you want to also obscure the filename A, you need to use e.g. API gateway to serve you the file without revealing anything about it to your front end. If it were me, I wouldn't bother.
Unless you absolutely have to, don't let users upload to the same bucket that other users download from. There are several approaches to uploads depending on the use-case. Pre-signed URL's are good for one-time use. You can also just provide the user with AWS credentials that are allowed to write-only to the upload bucket, by using Cognito.
There's no S3 quota. You get charged for reads and writes. For a simple site, these charges will be tiny. If you're worried, you can use Cloudfront to rate-limit your users. You can also use API Gateway to create limits for individual users. S3 is extremely parallelizable.

Exposing a web site through web services

I know what I am asking is somehow weird. There is a web application (which we don't have access to its source code), and we want to expose a few of its features as web services.
I was thinking to use something like Selenium WebDriver, so I simulate web clicks on the application according to the web service request.
I want to know whether this is a better solution or pattern to do this.
I shall mention that the application is written using Java, Spring MVC (it is not SPA) and Spring Security. And there is a CAS server providing SSO.

There are multiple ways to implement it. In my opinion Selenium/PhantomJS is not the best option as if the web is properly designed, you can interact with it only using the provided HTML or even some API rather than needing all the CSS, and execute the javascript async requests. As your page is not SPA it's quite likely that an "API" already exists in form of GET/POST requests and you might be lucky enough that there's no CSRF protection.
First of all, you need to solve the authentication against the CAS. There are multiple types of authentication in oAuth, but you should get an API token that enables you access to the application. This token should be added in form of HTTP Header or Cookie in every single request. Ideally this token shouldn't expire, otherwise you'll need to implement a re-authentication logic in your app.
Once the authentication part is resolved, you'll need quite a lot of patience, open the target website with the web inspector of your preferred web browser and go to the Network panel and execute the actions that you want to run programmatically. There you'll find your request with all the headers and content and the response.
That's what you need to code. There are plenty of libraries to achieve that in Java. You can have a look at Jsop if you need to parse HTML, but to run plain GET/POST requests, go for RestTemplate (in Spring) or JAX-RS/Jersey 2 Client.
You might consider implementing a cache layer to increase performance if the result of the query is maintained over the time, or you can assume that in, let's say 5 minutes, the response will be the same to the same query.
You can create your app in your favourite language/framework. I'd recommend to start with SpringBoot + MVC + DevTools. That'd contain all you need + Jsoup if you need to parse some HTML. Later on you can add the cache provider if needed.

We do something similar to access web banking on behalf of a user, scrape his account data and obtain a credit score. In most cases, we have managed to reverse-engineer mobile apps and sniff traffic to use undocumented APIs. In others, we have to fall back to web scraping.
You can have two other types of applications to scrape:
Data is essentially the same for any user, like product listings in Amazon
Data is specific to each user, like in a banking app.
In the firs case, you could have your scraper running and populating a local database and use your local data to provide the web service. In the later case, you cannot do that and you need to scrape the site on user's request.
I understand from your explanation that you are in this later case.
When web scraping you can find really difficult web apps:
Some may require you to send data from previous requests to the next
Others render most data on the client with JavaScript
If any of these two is your case, Selenium will make your implementation easier though not performant.
Implementing the first without selenium will require you to do lots of trial an error to get the thing working because you will be simulating the requests and you will need to know what data is expected from the client. Whereas if you use selenium you will be executing the same interactions that you do with the browser and hence sending the expected data.
Implementing the second case requires your scraper to support JavaScript. AFAIK best support is provided by selenium. HtmlUnit claims to provide fair support, and I think JSoup provides no support to JavaScript.
Finally, if your solution takes too much time you can mitigate the problem providing your web service with a notification mechanism, similar to Webhooks or Resthooks:
A client of your web service would make a request for data providing a URI they would like to get notified when the results are ready.
Your service would respond immediatly with an id of the request and start scraping the necessary info in the background.
If you use skinny payload model, when the scraping is done, you store the response in your data store with an id identifying the original request. This response will be exposed as a resource.
You would execute an HTTPPOST on the URI provided by the client. In the body of the request you would add the URI of the response resource.
The client can now GET the response resource and because the request and response have the same id, the client can correlate both.

Selenium isn't a best way to consume webservices. Selenium is preferably an automation tool largely used for testing the applications.
Assuming the services are already developed, the first thing we need to do is authenticate user request.
This can be done by adding a HttpHeader with key as "Authorization" and value as "Basic "+ Base64Encode(username+":"+password)
If the user is valid (Users login credentials match with credentials in server) then generate a unique token, store the token in server by mapping with the user Id and
set the same token in the response header or create a cookie containing token.
By doing this we can avoid validating credentials for the following requests form the same user by just looking for the token in the response header or cookie.
If the services are designed to chcek login every time the "Authorization" header needs to be set in request every time when the request is made.

I think it is a lot of overhead using a webdriver but it depends on what you really want to achieve. With the info you provided I would rather go with a restTemplate implementation sending the appropriate http messages to the existing webapp, wrap it with a nice #service layer and build your web service (rest or soap) on top of it.
The authentication is a matter of configuration, you can pack this in a microservice with #EnableOAuth2Sso and your restTemplate bean, thanks to spring boot, will handle the underlining auth part for you.

May be overkill..... But RPA? http://windowsitpro.com/scripting/review-automation-anywhere-enterprise

How to programmatically verify login credentials for a web form?

I'm building an app to let users export data from a university system. Currently, they can log in and see the data in HTML, but I would like to let people download it as CSV.
I have an app where users supply their username and password. I would like to log in to the university system and HTML scrape the resulting page. How can I do this?
I'm building a GWT app. I could either do this in Java-transliterated-JS on the client, or Java on the server.
Update: Selenium might be nice, but it looks like overkill.

You're going to have to do this from the server unless the domains are the same. You'd need to determine what the POST transaction used by the other server for the login step looks like - parameter names etc. Then you'd perform that operation and do whatever you want with what comes back. If you need to see multiple pages, you need to maintain the appropriate session cookie too so that the server knows you're still logged in on the subsequent HTTP requests.

If you have to hit another site to validate the credentials, then I'm not so sure that people should feel comfortable providing those credentials to you. That is, if you don't have rights to check the credentials directly, why are you trustworthy to receive them? I know sometimes people need to integrate with a system they don't own, so this is just a question.
First, this has to be done server-side because of the limitations on client scripting due to the same origin policy.
The typical way of handling the "screen scraping" you mention is to treat the web page as if it was an XML service. First, examine the source code of the page, then using an internet/HTTP stack, craft a POST to the correct URL and read the response using a standard XML library. It will take some ingenuity to come up with a good way to dig into the XML to find the piece you need that will be as insulated as possible from changes to the page. Keep in mind that your system can break any time that the owners of the site change their page.
Sometimes, you can't just send the POST but have to request the blank page initially in order to get hidden form values that need to be returned in the POST. You'll have to experiment to find out what it requires.
Additionally, you probably have to handle cookies as well, since they usually are an integral part of the web site's authentication and session management (though you might get lucky that the session doesn't matter between the initial POST and the first response).
Last, you may be unlucky enough that the site uses javascript to do part of the authentication work, which may require additional digging to understand how the credentials are posted to the site.
There are other potential barriers such as the site checking to see that the referrer is their own site, possible use of SSL (HTTPS) and so on.

I'm pretty sure that the protection against cross-site scripting in web browsers will mean that you can't log in to the university's app using javascript running in the web browser. So the part of your program that fetches data from the university will need to run on your server. Once you have the data, you can process it either on your server or in javascript in the browser, but I think it would be easier to do it on the server.
See http://en.wikipedia.org/wiki/Same_origin_policy

I'm not too sure about GWT, but in general, you would take the form data submitted by the user, check it against a database of username and hashed passwords. If the database checks out, set a session cookie that says the user is logged in.
In your pages, check if the session cookie say the user is logged in. If not, redirect to login page, otherwise allow them to view the pagfe.

Make two servers talk to each other

I have application written in GWT and hosted on Google AppEngine/Java. In this application user will have an option to upload video/audio/text file to the server. Those files could be big, up to 1gb or so and because GAE/J does not support large file I have to use another server to store those files. This would be easy to implement if there was no cross-domain security feature in browsers. So, what I'm thinking is to make GAE Server talk to my server (Glassfish or any other java servers if needed) to tell url to the file and if possible send status of uploaded file (how many percent was uploaded) so I can show status on clients screen. Here is what I'm thinking to do.
When user loads GWT page that is stored on GAE/J he/she will upload file to my server, then my server will send response back to GAE and GAE will send response to the client.
If this scenario is possible what would be the best way to implement GAE to Glassfish conversation?

Actually before that maybe you can try using first approach via by-passing cross-domain security of browsers using iframe. There are some ready to use components for this but for your problem which of them can be usable I don't know. Just google for these components...

Doing it the original way you suggested use URL Fetch Service
The down side to doing it the other way is that you introduce dependencies on multiple sites inside your web pages.
The downside of using the URL Fetch Service is that you have to pay by number of bytes transferred after you have reached the free quota.

One option would be to wait - the blobstore limit won't always be 50MB!
If you're in a hurry, though, I would suggest an approach like the following:
Have your App Engine app generate a signed token that signifies the user has permission to upload a file. The token should include the current date and time, the user's user ID, the maximum file size, and any other relevant information, and should be signed using HMAC-SHA1 with a secret key that your App Engine app and your server both know.
Return a form to the user that POSTs to a URL on your blob hosting server, and embeds the token you generated in step 1. If you want progress notifications, you can use a tool like plupload, and serve the form in an IFrame served by your upload server.
When the user uploads the file to your server, the server should return a redirect back to your App Engine app, with a new token embedded in the redirect URL. That token, again signed with a common secret, contains the ID of the newly uploaded file.
When your App Engine app receives a request for the redirect URL, it knows the upload was completed, and can record the new file's ID etc in the datastore.
Alternately, you can use Amazon's S3, which already supports all this with its HTML Form support.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.