Microservice architecture for uploading and downloading multiple files at once

Microservice architecture for uploading and downloading multiple files at once - java

I'm trying to create a new microservice with Spring Boot for uploading and downloading multiple files at once.
These files (PDF,XML,ZIP,TIFF,..), based on some conditions, can be stored inside a storage like S3 or inside another kind of storage. This microservice has to implement the logic to understand where these files are, download them temporarily in a local folder and then return them back to the client application.
The goal is to hide the recovery logic and the type of storage where the files reside to the client applications.
Each one of my business entity has several files associated so for the upload API I was thinking of using a Multipart Request to send the files of the same entity all together.
I would like to do the same for the download API: given the ID of an entity the API has to return all the files associated with it.
I don't know what's the best way to achieve this goal.
I have seen that there is a Multipart Response but I don't know if it is reliable.
Another idea is to download the files in a temporary shared folder and to send back to the client application the list of paths where they are.
Another one is to download always the files in a local (not shared) folder and to send back to the client application the list of URLs that it has to use to get them.
What do you think about it? Any other option?
Thanks for your help!

If you're looking to hide the fact that you're using S3 as the storage backing, my guess is that you're trying to either a) ensure you have the flexibility to change the storage backend at a later date, and/or b) put your own authentication in front of the upload/download to ensure users have permissions to read/write contents.
In either case, this sounds like a good job for a performant API gateway to ensure you maximize throughput. Instead of writing a custom service, you can write a configuration for something like Traefik that would a) authenticate requests, b) proxy the request to S3 directly, and c) rewrite the host and path to mask the usage of S3 as a storage backend. If you choose to use Traefik, take a look at the Routers section and the ReplacePathRegex middleware.

Related

How to access files in Laravel storage from a third part application (a Java app for instance) with some basic password protection/authentication?

I have been working with Laravel for a little less than a month; so, not aware of all the pre-defined functionality it has to offer. I have a bunch of CSV files in my Laravel application storage and I want to access them from another application (a Java application that processes those CSV files to produce some results).
What would be the best way to go about it?
I have a basic user management system set up and the users fill in an application form (which is where the csv files come from). These files are stored in the Laravel storage.
My current approach without using any built-in authentication (because I am not confident about how to use it in this case) is to have a controller return a downloadable file on a POST request (the file just gets downloaded upon request). The data sent with the POST request is the filename and a password, which if correct, returns the file; otherwise, gives an error. Is this a good way to approach the problem?
I simply want to retrieve the files by making a request from the Java application. Also, some basic protection is required so that everyone cannot access the files by making such requests. Any help or resources would be helpful. Thanks!

Use digitalocean space as additional shared storage between two servers (php and java) then make storage access private using digitalocean dashboard and finally add new website cors on space settings has your java domain and its http verbs (get post delete...). With this configuration you could access your cloud storage between two servers safely using access key and secret key.

How to properly use s3 to deliver and store files in a web application?

So we are planning to move static content to s3 for operational reasons. I just want to understand where to place s3 in the workflow of handling a request.
If website requires an image, should the request hit our service first which would fetch the image from s3 (reverse-proxy) or should client directly request the file.
How to hide file names ,pathnames and manage permissions in request for file?
Same questions applicable for uploading new content.
Handle s3 quota and parallel requests

I was going to comment, but this turned into a full answer instead...
Either. If your assets are public, the lowest-weight method is to just request them from a public S3 bucket. If they're not, though, it's probably easiest to use Cloudfront rather than rolling-your-own auth around S3 requests.
You can make it look like your asset A.jpeg in S3.yourBucket/A.jpeg is at yourWebsite.com/A.jpeg using Cloudfront. If you want to also obscure the filename A, you need to use e.g. API gateway to serve you the file without revealing anything about it to your front end. If it were me, I wouldn't bother.
Unless you absolutely have to, don't let users upload to the same bucket that other users download from. There are several approaches to uploads depending on the use-case. Pre-signed URL's are good for one-time use. You can also just provide the user with AWS credentials that are allowed to write-only to the upload bucket, by using Cognito.
There's no S3 quota. You get charged for reads and writes. For a simple site, these charges will be tiny. If you're worried, you can use Cloudfront to rate-limit your users. You can also use API Gateway to create limits for individual users. S3 is extremely parallelizable.

How can I connect to Alfresco documents through custom web application

Our java web application uses Alfresco as DMS. The application uses one single systemuser to connect to Alfresco. The application manages the access rights itself with some Business Logic.
Now what I'd like to accomplish, is to be able to use the MS Office URIs to do online editing of Word documents that live in Alfresco. So that's for example an URL that looks like ms-word:ofe|u|https://ourwebapp.com/documents/mydocument.docx
However if we open our documents like this, the user would end up being able to do stuff on Alfresco that we don't want them to do.
Because we want to keep our documents safe and secure, we don't want the users to be able to get the Alfresco documents "directly", but through our app. Opening Alfresco documents directly would mean that each individual user should get a unique Alfresco username/password and we don't have that and we don't want that because we already have lots and lots of documents living in Alfresco.
Surely there are other companies running into this problem? I.e. using their DMS with one single system user?
What I've already tried is to make REST endpoint. A Spring Filter ensures that an authorisation header with username/password is added and the request is forwarded to Alfresco. Then the response from Alfresco is passed back to the user. However this results in a document that's opened in read-only modus at best. Further more, it doesn't seem very secure to set up a connection with the user, using this system user credentials. For all I know, the user will be able to do stuff in Alfresco he isn't supposed to do. Like editing or even viewing other documents. A little bit like this:
There's very little documentation on how the ms-word protocol exactly works, maybe you can point me in the right direction? Or suggest some workarounds I might try out?

For this to work using sharepoint protocol (SPP) you woud have to reimplement the whole protocol server in front of your application since you control the access. There is no free or even available SPP implementation I know of you can (re)use for this.
The Alfresco protocol server may not be an option since you can't / want mirror access control from your app into alfresco. If you get access to a system like Alfresco or Sharepoint using file protocol you will get too much access rights as you already described. By following a concept of an application user you may be locked out from Alfresco concepts for end users if you can't mirror the access logic into alfresco.
Years ago we implemented a dynamic low level access voter to up- or downgrade access inside Alfresco's node service to allow specific permissions based on types and metadata. The same way someone could implement an interface to another system to delegate permission checks based on external data but this would slow down all the systems involved dramatically.
We have a similar requirement since we access documents and data from several enterprise sources including Alfresco from our own business process product having a rule and process based access concept based on cases, processes the documents are involved in- not on folders or document's static ACLs. We use a local service installed on the client partnering with the browser app for downloading, opening and saving back documents after closing the file from a local temporay (checked out) path. Our local client has no idea from Alfresco and is authenticated only against our services using JSON Web Tokens.
So my answer is more a concept not a ready to go solution in the hope to be helpful.

What's the best practice for storing encryption keys in a Java Tomcat web app for encrypting/decrypting sensitive data in a database?

We have run into a requirement to encrypt certain sensitive data before storing it in a PostgreSQL database. Our application is a Spring app running on top of Tomcat. We need to store the keys so that our web app can encrypt data before inserting it and decrypt data after retrieving it.
We would like to avoid storing this type of thing in our config files (or files that are filtered by Maven) since those files are checked into source control and readable by anyone with access. We seem to have the same issue with database credentials but it looks like JNDI datasources might solve those.
Are there best practices for this when using Tomcat? Things like environment variables might work but would be cumbersome to maintain.
EDIT: To clarify, we're trying to prevent sensitive data loss due to unauthorized access to the actual database file. Our db lives on a different physical box from our app server so we feel comfortable keeping the keys on the server that the application server lives on.

As far as I have seen, all the best practices around credentials optimize ease-of-use (for the operations team) and code simplification (for the development team), rather than optimize for security. This is very disappointing.
I would state the requirement as: Do not ever store any credentials or encryption keys in an unencrypted form, whether embedded in properties/config files, source code, or files managed separately from the application.
Some effects of this requirement:
JNDI data sources do not work for DB credentials, because the configuration of the JNDI data source resides in a configuration file in the container.
Passwords/passphrases for encryption/decryption (e.g., for the data in a database) cannot reside in the code or a properties file.
You must implement a way for the operations team to pass the secrets to the application as it is instantiated.
How might you design a system to meet this requirement?
Store all of your secrets in an encrypted file in a centralized location (e.g., in Amazon S3 or a Hardware Security Module). The passphrase/credentials to read this encrypted data must never be stored digitally - it is provided (e.g., in the body of a POST) when the app spins up. In addition to the passphrase to the secrets file, you may also need to dynamically provide credentials to get access to the file itself.
When the application receives the passphrase to the secrets file, it cracks it open and initializes it's internals (e.g., create DB data sources, initialize the encryption services, etc.).

You don't need encryption. You need access control.
Best practice? Yes. Cheap? No:
Don't use a single web server user account to log in to the db. This allows privilege escalation and data theft via SQL Injection. Use a unique connection per user. This kills connection pooling and allows maybe 500 simultaneous users max per db box.
Use a database that supports Row Level Security and Column Level Security. Right now, this is just Oracle. PG9.5 will have it. You can mimic it in 9.4
Put the db in its own network zone. Only the web server can connect.
Use a DAM (database activity monitor) to look for bad queries in and too much data out
A guide: https://securosis.com/assets/library/reports/Securosis_Understanding_DBEncryption.V_.1_.pdf

How to retrieve images stored in file server?

I have a java web service through which I upload images to a file server. I want to access these images from my java web app. How can I make the image files (and eventually other static files) available from this file server?
The only thing I could think of was to use Apache Http server as a proxy to my web app for these images, but that circumvents the security measures of the web app.
UPDATE:
Servlet container: Tomcat
Web app is on separate server from images.
Web service is on same server as images and has direct access to file system.
Both web app and service use spring security for authentication/authorization, I want to continue to use this security framework to for image access.

How are the files stored?
If security is a concern the best option might be to create a Servlet (or something similar) which will load up the image and serve it to the user, once it has checked their credentials.
How you load the image depends on exactly how they're stored, if you can access them via HTTP you can always open up a URLConnection to the file from the Servlet and serve it directly that way (i.e. using the Servlet as a sort of proxy server).
Without more details it's difficult to be specific.

I'm not sure if this will solve your problem, but it sounds like you should set up a context path that will map a URL to the path on your server. This can be done with tomcat's context files.
For a good explanation of the solution, check out a post on How to Program with Java

Sounds similar to Apache Hadoop.
Once image/file is requested, you have to make API call and pull the file out and do one of the following:
Store the temp file to the "temp" directory on web accessible server. You will need, some kind of cleaner/gc running in the background to clean those temp files. This is how Facebook does it with photos.
Instead of storing file on the server check the file type and set HTTP Content-type header to the appropriate file type. Image source will look like this <img src="getPicture.jsp?id=1234" />

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.