The background:
I have a Java Spring boot application.
I would like to have an API endpoint where the user can request to download a file.
The file is generated by the application and stored in some remote storage (S3 for example), prior to the download request
The customer does not have an access to the remote storage.
He sends a request to the application and as far as he is concerned the file is coming from the application.
The file downloaded can be very big.
The question
Is there a way to have the application be used as some sort of a tunnel?
Meaning - the application gets a request, start downloading the file chunk by chunk, and every downloaded chunk is also returned to the client.
This will create a similar experience to downloading from S3 itself and does not force the application to save the file locally or upload the whole thing into its memory
Related
Related to older question:
My application uses AWS Java S3Async client for multipart upload.
When application pauses and resumes upload it goes ok, but when application is restarted new uploadid is generated and completing upload does not work (Exception: One or more of the specified parts could not be found. The part may not have been uploaded, or the specified entity tag may not match the part's entity tag.) probably because some parts are uploaded using different uploadId.
Same happens when using AWS cli to complete upload.
In my web application I have a link which, when clicked, invokes an external web service to retrieve a download URL for a file.
I need to send back to client the file which is beyond this URL, instead of the download URL retrieved from the web service. If possible, I would also like to do it without having to download the file on my server beforehand.
I've found this question about a similar task, but which used PHP with the readfile() function.
Is there a similar way to do this in Java 8?
If you doesn't even want to handle that file you should answer the request with a redirect (eg HTTP 301 or 302). If you want to handle the file you should read the file in a byte buffer and send it to the client which would make the transfer slower.
Without seeing your implementation so far, this is my best suggest.
I have gone through the previous posts and similar questions were asked but I couldn't find a solution to my problem.
In my application user can download the files, So when user clicks on download our application server internally set's up the authenticated session with the web service which send the file data in the XML response as below :
<FileSerial xmlns="http://my.example.com/webservices">
<filedata>base64Binary</filedata>
<filesize>int</filesize>
<filetype>string</filetype>
<mime_type>string</mime_type>
</FileSerial>
And I have used spring-ws as below :
GetDocResponse docResponse = (GetDocResponse) webServiceTemplate.marshalSendAndReceive(getDoc);
FileSerial fileSerial = docResponse.getGetDocResult();
fileByte = fileSerial.getFiledata();
After several users hit the download our application server JVM memory goes very high and application server doesn't respond and have to be restarted.
My guess is that the fileByte is stored in my application server memory and that's causing the issue.
Is there any way to stream directly into the client browser without storing it in the application server memory.
Any sample code will be of help.
You are loading the full doc on your heap, plus the conversion to base64. If you don't use copy by reference every map from your binary data to other object is creating another entry on your heap.
You should use a multi part request and send the doc as an attachment of your ws request
MTOM example
In my java web application I have to process excel file from user. There are two way to process this first as File second as InputStreams.
I think InputStreams will be a memory consuming thing.
Is there any possible threat if I first save user uploaded file as .xls or .xlsx and then process it?
What are the cons & pros of both approach?
The best way to process web application files is after uploading completely and saving in your server as a file.
Streaming file processing should be avoided because HTTP model is designed to be a Request, Response model. You shouldn't ask the Web Client wait until you finish the file processing.
Best thing to do is upload file to a directory and send the Web Client a upload success message, with possibly a link where the end user can check for the results in the future.
And having a scheduled task to process the files in the uploaded directory and post the results in the results page.
This way web application will not have unnecessary delays and scale-able.
I have few question related to Web technologies. From my reading ant looking at Apache and Netty documents I could not figure out few things about downloading a large file with HTTP multipart/post request.
Is it possible to send HTTP request indicating request to download a file in smaller multipart (chunks)?
How to download large file in multipart ?
Please correct me if I have not understood the 'multipart' term itself. I know lot of people have faced this problem, where application (client) downloads files in smaller portion, so when network outage happens, application does not need to download whole file from the beginning again. Specially, when the file is not any media file.
Thanks.
Multipart refers to encoding multiple documents in one body, see this for the definition. For http, a multipart upload allows the client to send multiple documents with one post, for example uploading an image, and form fields in one request.
Multipart does not refer to downloading a document in multiple chunks.
You can use http ranges to restart downloading if a network outage occurs.