download of a large file from webservice causing application performance issue

download of a large file from webservice causing application performance issue - java

I have gone through the previous posts and similar questions were asked but I couldn't find a solution to my problem.
In my application user can download the files, So when user clicks on download our application server internally set's up the authenticated session with the web service which send the file data in the XML response as below :
<FileSerial xmlns="http://my.example.com/webservices">
<filedata>base64Binary</filedata>
<filesize>int</filesize>
<filetype>string</filetype>
<mime_type>string</mime_type>
</FileSerial>
And I have used spring-ws as below :
GetDocResponse docResponse = (GetDocResponse) webServiceTemplate.marshalSendAndReceive(getDoc);
FileSerial fileSerial = docResponse.getGetDocResult();
fileByte = fileSerial.getFiledata();
After several users hit the download our application server JVM memory goes very high and application server doesn't respond and have to be restarted.
My guess is that the fileByte is stored in my application server memory and that's causing the issue.
Is there any way to stream directly into the client browser without storing it in the application server memory.
Any sample code will be of help.

You are loading the full doc on your heap, plus the conversion to base64. If you don't use copy by reference every map from your binary data to other object is creating another entry on your heap.
You should use a multi part request and send the doc as an attachment of your ws request
MTOM example

Related

Downloading big files and passing them back to the client in chunks

The background:
I have a Java Spring boot application.
I would like to have an API endpoint where the user can request to download a file.
The file is generated by the application and stored in some remote storage (S3 for example), prior to the download request
The customer does not have an access to the remote storage.
He sends a request to the application and as far as he is concerned the file is coming from the application.
The file downloaded can be very big.
The question
Is there a way to have the application be used as some sort of a tunnel?
Meaning - the application gets a request, start downloading the file chunk by chunk, and every downloaded chunk is also returned to the client.
This will create a similar experience to downloading from S3 itself and does not force the application to save the file locally or upload the whole thing into its memory

java - send a file to client by file URL without download on the server

In my web application I have a link which, when clicked, invokes an external web service to retrieve a download URL for a file.
I need to send back to client the file which is beyond this URL, instead of the download URL retrieved from the web service. If possible, I would also like to do it without having to download the file on my server beforehand.
I've found this question about a similar task, but which used PHP with the readfile() function.
Is there a similar way to do this in Java 8?

If you doesn't even want to handle that file you should answer the request with a redirect (eg HTTP 301 or 302). If you want to handle the file you should read the file in a byte buffer and send it to the client which would make the transfer slower.
Without seeing your implementation so far, this is my best suggest.

Java Spring: Real-time status update to the client over REST API

I am developing a web application in Java Spring where I want the user to be able to upload a CSV file from the front-end and then see the real-time progress of the importing process and after importing he should be able to search individual entries from the imported data.
The importing process would consist of actually uploading the file (sending it via REST API POST request) and then reading it and saving its contents to a database so the user would be able to search from this data.
How could I show the real-time progress of this process? I found a tutorial for jQuery, which shows the progress of amount of data uploaded/transferred, but as the most the work is done while processing the uploaded file, I thought I would like a solution where before the line processing I find out the amount of lines in the file and then the user could see a live message like:
Lines processed: 1 out of 10000
It could update/change incrementally, but as one line is processed pretty quickly, showing each number of lines processed is not that important.
Either way, the question is, what's the easiest way to send these messages from Spring REST API to the client?

I found a solution myself and used Web Sockets for that.
I used this approach from the Spring documentation:
https://spring.io/guides/gs/messaging-stomp-websocket/
It could help on sending the messages for each processed line to the front end listener (after the web socket topic/connection is started) but I used a different approach for the data import, I used batch insert so that was unavailable for me, but web sockets are capable of doing that.

Where do file upload streams get the content from?

I have a question regarding file upload, which is more related to how it works rather than a code issue. I looked on the internet, but I couldn't find a proper answer.
I have a web application running on tomcat, which handles file uploads (through a servlet). Let's say I want now to upload huge files (> 1 Gb). My understading was that the multipart content of the HTTP request was available in my servlet once the whole file was actually transfered.
My question is where the content of the request is actually stored ? When one calls HttpServletRequest.getParts() an InputStream is available on the Part object. However, where is the stream reading from ? Does Tomcat store it somewhere ?
I guess this might not be clear enough, so I'll update the post according to your comments, if any.
Thanks

Tomcat stores Parts in "X:\some\path\Tomcat 7.0\temp" (/some/path/apache-tomcat-7.0.x/temp) directory.
when a multipart request is parsed, if the size of a single part exceed a threshold, a temporary file is created for that part.
your servlet/jsp will be invoked when transfer of all parts has been completed.
when the request is destroyed all temporary files are deleted as well.
if you are interested in the multipart parse phase, take a look at apache commons-fileupload (specifically ServletFileUpload.parseRequest()), tomcat is based on a variant of that
UPDATE
you can configure it as a java arg, ie in windows:

The InputStream will typically read from a temporary file which is created by the multipart framework during the request. The temp file is normally stored in the application server's temporary area - as specified by the servlet context attribute javax.servlet.context.tempdir. In Tomcat this is somewhere beneath $CATALINA_HOME/work. The file will be deleted once the request completes.
For small file sizes, the multipart framework may keep the whole upload in memory - in which case the InputStream will be reading directly from memory.
If you're using Spring's CommonsMultipartResolver then you can set the maximum upload size allowed in memory via the maxInMemorySize property. If an upload is bigger than this, then it will be stored as a temp file on disk.

I think we should step back for a moment and give a thought on the web infrastructure. First of all the HTTP transmits text data, so binary information encoded in base 64 so that data won't get messed up. This ends up leading to large amouts of data and this gives birth to the multipart form, which breaks datum into parts of encoded text with special markers that allow the server to assembly everything together. But to use this data we have to decode it first, and to do that I have to use the multiple parts of the form.
[a break so we can breath]
Continuing, so the browser needs to send lots of datum (1GB as you mentioned in your example), this datum is encoded with base64 and then separated into pieces (the multipart form) with its markers, then the browser starts to send the pieces to the server, but the server only returns the HTTP RESPONSE once it has finished receiving and processing the HTTP REQUEST (or if a timeout occurs, which incurs in an error on the browser screen).
What can assume here is that Tomcat could (I didn't check the internals) start decoding each part of the multipart that has already arrieved (either from the temp file or from memory) passing the inputstream to the user, since the inputstrem reading is a blocking operation the server would wait for the next piece of data to pass to Tomcat, which in turn would pass it to the program that is processing the data.
Once all data has reached the server the program would prepare the response that Tomcat would return to the browser completing the HTTP Request-Response cycle and closing the connection (since HTTP is a connectionless protocol).
Hope it helps :)

Tomcat follows the Servlet 3.0 specification which allows you to specify things such as how large of a multipart "part" can be before it gets stored (temporarily) on the disk, where temporary files will be written, what the maximum size of a file is, and what the maximum size of the whole request can be. You can find all kinds of good information about configuring multipart uploads (in Tomcat or any other spec-3.0-compliant server) here and here.
Tomcat's implementation specifics aren't terribly relevant: it adheres to the spec. If the file to be uploaded is smaller than the threshold set, then you should be able to read the bytes of the file from memory (i.e. no disk involved). If the file is larger, then it will be written to disk, first (in its entirety) and then you can get the bytes from the container.
So if you want to receive a 1GiB file and don't have that kind of memory available (I wouldn't recommend allowing clients to fill-up your heap with 1GiB of uploaded data for each upload... easy DoS if you just start several simultaneous 1GiB uploads and you are toast), then Tomcat (or whatever container you are using) will read the file (again, in its entirety) onto the disk, and when your servlet gets control, you can read the bytes back from that file.
Note that the container must process the entire multipart request before any of your code really runs. That's to prevent you from breaking anything by partially-reading the request's InputStream or anything like that. Processing multipart requests is non-trivial, and it's easy to break things.
If you want to be able to stream large files for processing (e.g. huge XML files that can be processed serially), then you are going to want to handle the multipart parsing yourself. That way, you don't need a huge amount of heap to buffer the file and you don't need to store the file on the disk before you start processing it. (If this is your use-case, I suggest using HTTP PUT or HTTP POST and not using multipart requests.)
(It's worth mentioning that base64 encoding is not even mentioned in any specification for multipart processing. A few folks have mentioned base64 here, but I've never seen a standard web client use base64 for uploading a file using multipart/form-data. HTTP handles binary uploads just fine, thanks.)

Here is it
User's browser composes http multiple parts request
Tcp/ip stack of user's OS slices them into packets
Routers over the internet pass those packet to your server
Tcp/ip stack of your server's OS get back payloads and passes them
to tcp port listener
Tomcat http connector decodes http post request from tcp data
(source code is
https://github.com/apache/tomcat/tree/trunk/java/org/apache/coyote )
Tomcat http connector wrap a Http Request and eventually forwards to
your servlet (https://github.com/apache/tomcat/blob/trunk/java/org/apache/catalina/connector/Request.java)
Before and while your code reading the content of Http Request, tomcat will buffer the http request body internally
Tomcat will not parse multiple parts body before you call request.getParts() (https://github.com/apache/tomcat/blob/trunk/java/org/apache/catalina/connector/Request.java#L2561), thus no temp file for parts before calling.
Tomcat stores files uploaded into location pointing by #MultipartConfig annotation in your servlet code, unless your code doesn't provide it and allowCasualMultipartParsing is set (http://tomcat.apache.org/tomcat-7.0-doc/config/context.html#Common_Attributes)
Considering allowCasualMultipartParsing is false by default, you should not worry about where tomcat stores file though it is easy to dig out.
I mention 1~5 because it is important to understand the stream returns by request.getInputStream() which is required before Servlet 3.x request.getParts() feature. Typically, tomcat will deliver the request to web app very soon, it is not necessary to wait client side to finish uploading, thus tomcat need not buffer a lot of data. I have left java server side for some years, before JSR-000315 is approved :-)

Make two servers talk to each other

I have application written in GWT and hosted on Google AppEngine/Java. In this application user will have an option to upload video/audio/text file to the server. Those files could be big, up to 1gb or so and because GAE/J does not support large file I have to use another server to store those files. This would be easy to implement if there was no cross-domain security feature in browsers. So, what I'm thinking is to make GAE Server talk to my server (Glassfish or any other java servers if needed) to tell url to the file and if possible send status of uploaded file (how many percent was uploaded) so I can show status on clients screen. Here is what I'm thinking to do.
When user loads GWT page that is stored on GAE/J he/she will upload file to my server, then my server will send response back to GAE and GAE will send response to the client.
If this scenario is possible what would be the best way to implement GAE to Glassfish conversation?

Actually before that maybe you can try using first approach via by-passing cross-domain security of browsers using iframe. There are some ready to use components for this but for your problem which of them can be usable I don't know. Just google for these components...

Doing it the original way you suggested use URL Fetch Service
The down side to doing it the other way is that you introduce dependencies on multiple sites inside your web pages.
The downside of using the URL Fetch Service is that you have to pay by number of bytes transferred after you have reached the free quota.

One option would be to wait - the blobstore limit won't always be 50MB!
If you're in a hurry, though, I would suggest an approach like the following:
Have your App Engine app generate a signed token that signifies the user has permission to upload a file. The token should include the current date and time, the user's user ID, the maximum file size, and any other relevant information, and should be signed using HMAC-SHA1 with a secret key that your App Engine app and your server both know.
Return a form to the user that POSTs to a URL on your blob hosting server, and embeds the token you generated in step 1. If you want progress notifications, you can use a tool like plupload, and serve the form in an IFrame served by your upload server.
When the user uploads the file to your server, the server should return a redirect back to your App Engine app, with a new token embedded in the redirect URL. That token, again signed with a common secret, contains the ID of the newly uploaded file.
When your App Engine app receives a request for the redirect URL, it knows the upload was completed, and can record the new file's ID etc in the datastore.
Alternately, you can use Amazon's S3, which already supports all this with its HTML Form support.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.