How to stream several large files to jetty - java

I would like to send several large files via HTTP using multipart/formdata.
I actually just want to stream them through my service, so I'd like to get the different parts as streams and absolutely would like to avoid that the whole request is buffered in memory before I get a chance to pass the data on.
I get the feeling that with jetty (we're using Dropwizard 0.7.1 which comes with jetty 9.0.7) the whole request gets buffered before my code is executed.
Is there a way to avoid that? Ideally, I'd like to have an event-based system (which fires an event like "next part with name xxx" and gives me a stream I can consume).

A request with multipart/formdata is processed by various internal components to break apart the sections so that HttpServletRequest.getParts() (and various similar methods) can work properly.
Option #1: Handle Multi-part yourself
It can be a bit tricky to subvert this behavior of the Servlet spec, but I'll give it a go.
First, do not declare the #MultipartConfig configuration for the servlet you want to handle this request data.
Next, do not access methods in the HttpServletRequest that need to know about the parameters of the request, or its parts.
Override the HttpServlet.service(HttpServletRequest, HttpServletResponse) method, not the doPost() method, and process the raw Request payload content yourself.
This means you'll be writing a MultiPart InputStream Parser, and handling the parsing of the multi-part yourself. There's plenty of examples of this online, you'll just want to pick one that makes more sense to you.
Option #2: Don't use POST with multi-part
If you are streaming upload a file, don't use POST with multi-part, use PUT with raw payload data, then you'll skip the entire layer of magic that is the multi-part request POST payload.

Related

How to serve massive amount of data as a file in Java Spring MVC as it's being created?

My production servers has hundreds of users per block and I've realized that exporting data could potentially blow memory and ruin the app for multiple users.
We are talking about millions of data being exported by a single user.
Is there a way to create a CSV file and stream it to the front end as it is being generated to use as little memory as possible?
Making the front end request batches and generate the CSV file in the front end is not an option, this call will be used for other platforms and I'm trying to make it as clean as possible for all.
If you look at the Spring Framework Documentation on Spring Web MVC, section 1.4.3. Handler Methods, sub-section Return Values, you will find many ways to return streaming data, e.g.
void - A method with a void return type (or null return value) is considered to have fully handled the response if it also has a ServletResponse, or an OutputStream argument, or an #ResponseStatus annotation.
ResponseBodyEmitter - Emit a stream of objects asynchronously to be written to the response with HttpMessageConverter's; also supported as the body of a ResponseEntity. See Async Requests and HTTP Streaming.
That means you can do it:
Synchronous: Write the raw response yourself to the HTTP response stream in the handler method. Response is complete when method returns.
Asynchronous: Prepare the streaming (including HTTP headers) in your handler method, then do the actual streaming in another thread.

ServletRequest/ServletResponse manipulation

I want to implement an advanced Java servlet filter for processing batch requests on API server. Something similar to the Facebook batch request API. The idea is:
setup servlet filter on given url
override doFilter(request, response), here:
parse list of partial requests from body, for each:
prepare partial request
call chain.doFilter(partialRequest, partialResponse)
remember partial response
render response with list of partial responses
I am able to construct HttpServletRequestWrapper for each partial request, and create HttpServletResponseWrapper with some output stream cheating, but this is a bit hard, I have to change almost all parts, path, body, headers etc.
Are there any good library for request/response manipulation, or better request/response wrapper class?
I understand that you want to consolidate as many requests as possible into one, but I don't think you would de-consolidate them on the back-end.
I think your approach complicates things, and I'm not even sure if it's possible to spawn new HttpRequest objects on the back-end.
Drop the filters, stick with one request (on the front-end and back-end), and spawn a new Thread for each task in your request.
Retrospective update for those you interested:
Finally I have dived for one full day into HttpServletRequestWrapper and HttpServletResponseWrapper dark forest and have done fully functional batch filter providing multiple requests to servlet and aggregating responses.
Unfortunately this filter must be the last filter in a row just before servlet, because subsequent filters are called only once.

Java's Jersey, RESTful API, and JSONP

This must have been answered previously, but my Google powers are off today and I have been struggling with this for a bit. We are migrating from an old PHP base to a Jersey-based JVM stack, which will ultimately provide a JSON-based RESTful API that can be consumed from many applications. Things have been really good so far and we love the easy POJO-to-JSON conversion. However, we are dealing with difficulties in Cross-Domain JSON requests. We essentially have all of our responses returning JSON (using #Produces("application/json") and the com.sun.jersey.api.json.POJOMappingFeature set to true) but for JSONP support we need to change our methods to return an instance of JSONWithPadding. This of course also requires us to add a #QueryParam("callback") parameter to each method, which will essentially duplicate our efforts, causing two methods to be needed to respond with the same data depending on whether or not there is a callback parameter in the request. Obviously, this is not what we want.
So we essentially have tried a couple different options. Being relatively new to Jersey, I am sure this problem has been solved. I read from a few places that I could write a request filter or I could extend the JSON Provider. My ideal solution is to have no impact on our data or logic layers and instead have some code that says "if there is a call back parameter, surround the JSON with the callback, otherwise just return the JSON". A solution was found here:
http://jersey.576304.n2.nabble.com/JsonP-without-using-JSONWithPadding-td7015082.html
However, that solution extends the Jackson JSON object, not the default JSON provider.
What are the best practices? If I am on the right track, what is class for the default JSON filter that I can extend? Is there any additional configuration needed? Am I completely off track?
If all your resource methods return JSONWithPadding object, then Jersey automatically figures out if it should return JSON (i.e. just the object wrapped by it) or the callback as well based on the requested media type - i.e. if the media type requested by the client is any of application/javascript, application/x-javascript, text/ecmascript, application/ecmascript or text/jscript, then Jersey returns the object wrapped by the callback. If the requested media type is application/json, Jersey returns the JSON object (i.e. does not wrap it with the callback). So, one way to make this work is to make your resource method produce all the above media types (including application/json), always return JSONWithPadding and let Jersey figure out what to do.
If this does not work for you, let us know why it does not cover your use case (at users at jersey.java.net). Anyway, in that case you can use ContainerRequest/ResponseFilters. In the request filter you can modify the request headers any way you want (e.g. adjust the accept header) to ensure it matches the right resource method. Then in the response filter you can wrap the response entity using the JSONWithPadding depending on whether the callback query param is available and adjust the content type header.
So what I ultimately ended up doing (before Martin's great response came in) was creating a Filter and a ResponseWrapper that intercepted the output. The basis for the code is at http://docs.oracle.com/cd/B31017_01/web.1013/b28959/filters.htm
Essentially, the filter checks to see if the callback parameter exists. If it does, it prepends the callback to the outputted JSON and appends the ) at the end. This works great for us in our testing, although it has not been hardened yet. While I would have loved for Jersey to be able to handle it automatically, I could not get it to work with jQuery correctly (probably something on my side, not a problem with Jersey). We have pre-existing jQuery calls and we are changing the URLs to look at the new Jersey Server and we really didn't want to go into each $.ajax call to change any headers or content types in the calls if we didn't have to.
Aside from the small issue, Jersey has been great to work with!

Struts and logging HTTP POST request body

I'm trying to log the raw body of HTTP POST requests in our application based on Struts, running on Tomcat 6. I've found one previous post on SO that was somewhat helpful, but the accepted solution doesn't work properly in my case. The problem is, I want to log the POST body only in certain cases, and let Struts parse the parameters from the body after logging. Currently, in the Filter I wrote I can read and log the body from the HttpServletRequestWrapper object, but after that Struts can't find any parameters to parse, so the DispatchAction call (which depends on one of the parameters from the request) fails.
I did some digging through Struts and Tomcat source code, and found that it doesn't matter if I store the POST body into a byte array, and expose a Stream and a Reader based on that array; when the parameters need to get parsed, Tomcat's Request object accesses its internal InputStream, which has already been read by that time.
Does anyone have an idea how to implement this kind of logging correctly?
In fact, Struts doesn't parse the parameters, it relies on the Servlet container to do that. And once the container has read the inputStream to create the parameters Map, of course there is nothing left to read. And in the Tomcat implementation, if you read the inputStream first, then the getParameter* family of methods has nothing left to work on, since, as you correctly note, it doesn't use getInputStream or getReader but accesses internally its optimized reader.
So your only solution in your ServletRequestWrapper is to override getInputStream, getReader, AND the getParameter* family on which Struts relies to read the parameters. Maybe you can have a look at org.apache.catalina.util.RequestUtil to not duplicate the POST body parsing part.
What you have to do in your filter is read the post content in its entirety then when you go to pass the request on to the chain; back the input stream with your own. For example you read the post to file on disk, then when you call:
chain.doFilter(new ServletRequest() {}, response);
You can delegate most methods invocations of your class to the original request, but when it comes time to opening the input stream you need to read from your file on disk.
You need to make sure you don't leak resources as this will be invoked quite frequently and can hurt if done incorrectly.
The in the question linked filter example looks good and ought to work. Maybe you're defining it in the web.xml after the Struts dispatcher filter. It would then indeed be too late to parse and log the request body and still make it available for Struts. You need to declare this filter before the Struts dispatcher filter. The filter ordering matters, they are invoked in the order as they're definied in web.xml.

How do I separate out query string params from POST data in a java servlet

When you get a doGet or doPost call in a servlet you can use getparameterxxx() to get either the query string or the post data in one easy place.
If the call was a GET, you get data from the url/query string.
If the call was a POST, you get the post data all parsed out for you.
Except as it turns out, if you don't put an 'action' attribute in your form call.
If you specify a fully qualified or partially qualified url for the action param everything works great, if you don't, the browser will call the same url as it did on the previous page submit, and if there happens to be query string data there, you'll get that as well as POST data, and there's no way to tell them apart.
Or is there?
I'm looking through the request object, I see where the post data comes from, I'm just trying to figure out where the GET data comes from, so I can erase the GET data on a post call and erase the post data on a GET call before it parses it out if possible.
Any idea what the safe way to do this is?
And lemme guess: you never tried to not put an action field in a form tag. :-)
You're right, I never tried not to put an action field in a form tag ;-) and I wouldn't, because of exactly what you're talking about. (Also, I think it's not valid HTML)
I don't know of any "clean" way to distinguish between GET and POST parameters, but you can access the raw query string using the getQueryString() method of HttpServletRequest, and you can access the raw POST data using the getInputStream() method of ServletRequest. (I'm looking at the Tomcat API docs specifically here, although I think those are both part of the standard Servlet API) Then you could parse the POST data and GET data separately if you want. They will (or should normally) both be formatted the same way, i.e.
name1=value1&name2=value2&...
though possibly with the ampersands replaced by semicolons (which you can technically do in HTTP/1.1, I didn't know that until recently)
In HTML, action is REQUIRED, so I guess the behavior will vary among clients.
The HttpServletRequest.getParameterxxx() methods don't distinguish between GET and POST parameters. If you really need to distinguish between them, you'll need to parse them manually using getQueryString() for the GET parameters and getInputStream()/getReader() for the POST data.
I would write a ServletFilter and decorate the request object to clean things up a bit (using what Hilton suggested above). This is the classic decorator pattern in an intercepting filter.

Categories

Resources