I want to implement an advanced Java servlet filter for processing batch requests on API server. Something similar to the Facebook batch request API. The idea is:
setup servlet filter on given url
override doFilter(request, response), here:
parse list of partial requests from body, for each:
prepare partial request
call chain.doFilter(partialRequest, partialResponse)
remember partial response
render response with list of partial responses
I am able to construct HttpServletRequestWrapper for each partial request, and create HttpServletResponseWrapper with some output stream cheating, but this is a bit hard, I have to change almost all parts, path, body, headers etc.
Are there any good library for request/response manipulation, or better request/response wrapper class?
I understand that you want to consolidate as many requests as possible into one, but I don't think you would de-consolidate them on the back-end.
I think your approach complicates things, and I'm not even sure if it's possible to spawn new HttpRequest objects on the back-end.
Drop the filters, stick with one request (on the front-end and back-end), and spawn a new Thread for each task in your request.
Retrospective update for those you interested:
Finally I have dived for one full day into HttpServletRequestWrapper and HttpServletResponseWrapper dark forest and have done fully functional batch filter providing multiple requests to servlet and aggregating responses.
Unfortunately this filter must be the last filter in a row just before servlet, because subsequent filters are called only once.
Related
I have multiple async soap requests and I need to mock the responses of those requests. However, it seems the order of the expected requests matter and I was wondering if there were any workarounds.
I came across this example using MockRestServiceServer where it uses the ignoreExpectOrder() method.
You can write your own implementation of the ResponseCreator that sets up a number of URI and responses (for asynchronous requests) and return the matching response based on the input URI. I've done something similar and it is works.
My production servers has hundreds of users per block and I've realized that exporting data could potentially blow memory and ruin the app for multiple users.
We are talking about millions of data being exported by a single user.
Is there a way to create a CSV file and stream it to the front end as it is being generated to use as little memory as possible?
Making the front end request batches and generate the CSV file in the front end is not an option, this call will be used for other platforms and I'm trying to make it as clean as possible for all.
If you look at the Spring Framework Documentation on Spring Web MVC, section 1.4.3. Handler Methods, sub-section Return Values, you will find many ways to return streaming data, e.g.
void - A method with a void return type (or null return value) is considered to have fully handled the response if it also has a ServletResponse, or an OutputStream argument, or an #ResponseStatus annotation.
ResponseBodyEmitter - Emit a stream of objects asynchronously to be written to the response with HttpMessageConverter's; also supported as the body of a ResponseEntity. See Async Requests and HTTP Streaming.
That means you can do it:
Synchronous: Write the raw response yourself to the HTTP response stream in the handler method. Response is complete when method returns.
Asynchronous: Prepare the streaming (including HTTP headers) in your handler method, then do the actual streaming in another thread.
There are many resources available over the internet wherein PUT vs POST is discussed. But I could not understand how would that affect the Java implementation or back end implementation which is done underneath for a RestFul service? Links I viewed are mentioned below:
https://www.keycdn.com/support/put-vs-post/
https://spring.io/understanding/REST#post
https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
http://javarevisited.blogspot.com/2016/10/difference-between-put-and-post-in-restful-web-service.html
For example let's say there is a RestFul webservice for Address.
So POST /addresses will do the job of updating the Address and PUT /addresses/1 will do the job of creating the one.
Now how the HTTP method PUT and POST can control what weservice code is doing behind the scenes?
PUT /addresses/1
may end up creating multiple entries of the same address in the DB.
So my question is, why the idempotent behavior is linked to the HTTP method?
How will you control the idempotent behavior by using specif HTTP methods? Or is it that just a guideline or standard practice suggested?
I am not looking for an explanation of what is idempotent behavior but what make us tag these HTTP methods so?
This is HTTP specific. As RFC linked by you states that https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html (see up to date RFC links at the bottom of this answer). It is not described as part of REST: https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
Now you wrote,
I am not looking for an explanation of what is idempotent behavior but
what make us tag these HTTP methods so?
An idempotent operation has always the same result (I know you know it), but the result is not the same thing as the HTTP response. It should be obvious from HTTP perspective that multiple requests with any method even all the same parameters can have different responses (ie. timestamps). So they can actually differ.
What should not change is the result of the operation. So calling multiple times PUT /addresses/1 should not create multiple addresses.
As you see it's called PUT not CREATE for a reason. It may create resource if it does not exist. If it exists then it may overwrite it with the new version (update) if its exactly the same should do nothing on the server and result in the same answer as if it would be the same request (because it may be the same request repeated because the previous request was interrupted and the client did not receive response).
Comparing to SQL PUT would more like INSERT OR UPDATE not only INSERT or UPDATE.
So my question is, why the idempotent behavior is linked to the HTTP method?
It is likned to HTTP method so some services (proxies) know that in case of failure of request they can try safely (not in the terms of safe HTTP method but in the terms of idempotence) repeat them.
How will you control the idempotent behavior by using specif HTTP methods?
I'm not sure what are you asking for.
But:
GET, HEAD just return data it does not change anything (apart maybe some logs, stats, metadata?) so it's safe and idempotent.
POST, PATCH can do anything it is not safe nor idempotent
PUT, DELETE - are not safe (they change data) but they are idempotent so it is "safe" to repeat them.
This basically means that safe method can be made by proxies, caches, web crawlers etc. safely without changing anything. Idempotent can be repeated by software and it will not change the outcome.
Or is it that just a guideline or standard practice suggested?
It is "standard". Maybe RFC is not standard yet but it will eventually be one and we don't have anything else we could (and should) follow.
Edit:
As RFC mentioned above is outdated here are some references to current RFCs about that topic:
Retrying idempotent requests by client or proxy: https://www.rfc-editor.org/rfc/rfc7230#section-6.3.1
Pipelineing idempotent requests: https://www.rfc-editor.org/rfc/rfc7230#section-6.3.2
Idempotent methods in: HTTP https://www.rfc-editor.org/rfc/rfc7231#section-4.2.2
Thanks to Roman Vottner for the suggestion.
So my question is, why the idempotent behavior is linked to the HTTP method?
I am not looking for an explanation of what is idempotent behavior but what make us tag these HTTP methods so?
So that generic, domain agnostic participants in the exchange of messages can make useful contributions.
RFC 7231 calls out a specific example in its definition of idempotent
Idempotent methods are distinguished because the request can be repeated automatically if a communication failure occurs before the client is able to read the server's response. For example, if a client sends a PUT request and the underlying connection is closed before any response is received, then the client can establish a new connection and retry the idempotent request. It knows that repeating the request will have the same intended effect, even if the original request succeeded, though the response might differ.
A client, or intermediary, doesn't need to know anything about your bespoke API, or its underlying implementation, to act this way. All of the necessary information is in the specification (RFC 7231's definitions of PUT and idempotent), and in the server's announcement that the resource supports PUT.
Note that idempotent request handling is required of PUT, but it is not forbidden for POST. It's not wrong to have an idempotent POST request handler, or even one that is safe. But generic components, that have only the metadata and the HTTP spec to work from, will not know or discover that the POST request handler is idempotent.
I could not understand how would that affect the Java implementation or back end implementation which is done underneath for a RestFul service?
There's no magic; using PUT doesn't automatically change the underlying implementation of the service; technically, it doesn't even constrain the underlying implementation. What it does do is clearly document where the responsibility lies.
It's analogous to Fielding's 2002 observation about GET being safe
HTTP does not attempt to require the results of a GET to be safe. What
it does is require that the semantics of the operation be safe, and
therefore it is a fault of the implementation, not the interface
or the user of that interface, if anything happens as a result that
causes loss of property (money, BTW, is considered property for the
sake of this definition).
An important thing to realize is that, as far as HTTP is concerned, there is no "resource hierarchy". There's no relationship between /addresses and /addresses/1 -- for example, messages to one have no effect on cached representations of the other. The notion that /addresses is a "collection" and /addresses/1 is an "item in the /addresses collection" is an implementation detail, private to the origin server.
(It used to be the case that the semantics of POST would refer to subordinate resources, see for example RFC 1945; but even then the spelling of the identifier for the subordinate was not constrainted.)
I mean PUT /employee is acceptable or it has to be PUT/employee/<employee-id>
PUT /employee has the semantics of "replace the current representation of /employee with the representation I'm providing". If /employee is a representation of a collection, it is perfectly fine to modify that collection by passing with PUT a new representation of the collection.
GET /collection
200 OK
{/collection/1, collection/2}
PUT /collection
{/collection/1, /collection/2, /collection/3}
200 OK
GET /collection
200 OK
{/collection/1, /collection/2, /collection/3}
PUT /collection
{/collection/4}
200 OK
GET /collection
200 OK
{/collection/4}
If that's not what you want; if you want to append to the collection, rather than replace the entire representation, then PUT has the wrong semantics when applied to the collection. You either need to PUT the item representation to an item resource, or you need to use some other method on the collection (POST or PATCH are suitable)
GET /collection
200 OK
{/collection/1, collection/2}
PUT /collection/3
200 OK
GET /collection
200 OK
{/collection/1, /collection/2, /collection/3}
PATCH /collection
{ op: add, path: /4, ... }
200 OK
GET /collection
200 OK
{/collection/1, /collection/2, /collection/3, /collection/4 }
How will you control the idempotent behavior by using specific HTTP
methods? Or is it that just a guideline or standard practice
suggested?
It is more about the HTTP specification and an app must follow these specifications. Nothing stops you from altering the behavior on the server side.
There is always a difference between the web service and a Restful Web service.
Consider some legacy apps which uses servlets. Servlets used to have doGet and doPost methods. doPost was always recommended for security reasons and storing data on server/db over doGet. as the info is embedded in the request it self and is not exposed to the outer worlds.
Even there nothing stops you to save data in doGet or return some static pages in doPost hence it's all about following underlying specs
So my question is, why the idempotent behavior is linked to the HTTP method?
Because the HTTP specifications says so:
4.2.2. Idempotent Methods
A request method is considered "idempotent" if the intended effect on
the server of multiple identical requests with that method is the
same as the effect for a single such request. Of the request methods
defined by this specification, PUT, DELETE, and safe request methods
are idempotent.
(Source: RFC 7231)
Why do the specs say this?
Because is useful for implementing HTTP based systems to be able to distinguish idempotent and non-idempotent requests. And the "method" type provides a good way to make the distinction.
Because other parts of the HTTP specification are predicated on being able to distinguish idempotent methods; e.g. proxies and caching rules.
How will you control the idempotent behavior by using specif HTTP methods?
It is up to the server to implement PUT & DELETE to have idempotent behavior. If a server doesn't, then it is violating the HTTP specification.
Or is it that just a guideline or standard practice suggested?
It is required behavior.
Now you could ignore the requirement (there are no protocol police!) but if you do, it is liable to cause your systems to break. Especially if you need to integrate them with systems implemented by other people ... who might write their client code assuming that if it replays a PUT or DELETE that your server won't say "error".
In short, we use specifications like HTTP so that our systems are interoperable. But this strategy only works properly if everyone's code implements the specifications correctly.
POST - is generally not idempotent so that multiple calls will create multiple objects with different IDs
PUT - given the id in the URI you would apply "create or update" query to your database thus once the resource is created every next call will make no difference to the backend state
I.e. there is a clear difference in the backend in how you generate new / updated existing stored objects. E.g. assuming you are using MySQL and auto-generated ID:
POST will end up as INSERT query
PUT will end up in INSERT ... ON DUPLICATE KEY UPDATE query
Normally in Rest APIs. We used
POST - Add data
GET - get data
PUT - update data
DELETE - delete data
Read below post to get more idea.
REST API Best Practices
I would like to send several large files via HTTP using multipart/formdata.
I actually just want to stream them through my service, so I'd like to get the different parts as streams and absolutely would like to avoid that the whole request is buffered in memory before I get a chance to pass the data on.
I get the feeling that with jetty (we're using Dropwizard 0.7.1 which comes with jetty 9.0.7) the whole request gets buffered before my code is executed.
Is there a way to avoid that? Ideally, I'd like to have an event-based system (which fires an event like "next part with name xxx" and gives me a stream I can consume).
A request with multipart/formdata is processed by various internal components to break apart the sections so that HttpServletRequest.getParts() (and various similar methods) can work properly.
Option #1: Handle Multi-part yourself
It can be a bit tricky to subvert this behavior of the Servlet spec, but I'll give it a go.
First, do not declare the #MultipartConfig configuration for the servlet you want to handle this request data.
Next, do not access methods in the HttpServletRequest that need to know about the parameters of the request, or its parts.
Override the HttpServlet.service(HttpServletRequest, HttpServletResponse) method, not the doPost() method, and process the raw Request payload content yourself.
This means you'll be writing a MultiPart InputStream Parser, and handling the parsing of the multi-part yourself. There's plenty of examples of this online, you'll just want to pick one that makes more sense to you.
Option #2: Don't use POST with multi-part
If you are streaming upload a file, don't use POST with multi-part, use PUT with raw payload data, then you'll skip the entire layer of magic that is the multi-part request POST payload.
I'm trying to log the raw body of HTTP POST requests in our application based on Struts, running on Tomcat 6. I've found one previous post on SO that was somewhat helpful, but the accepted solution doesn't work properly in my case. The problem is, I want to log the POST body only in certain cases, and let Struts parse the parameters from the body after logging. Currently, in the Filter I wrote I can read and log the body from the HttpServletRequestWrapper object, but after that Struts can't find any parameters to parse, so the DispatchAction call (which depends on one of the parameters from the request) fails.
I did some digging through Struts and Tomcat source code, and found that it doesn't matter if I store the POST body into a byte array, and expose a Stream and a Reader based on that array; when the parameters need to get parsed, Tomcat's Request object accesses its internal InputStream, which has already been read by that time.
Does anyone have an idea how to implement this kind of logging correctly?
In fact, Struts doesn't parse the parameters, it relies on the Servlet container to do that. And once the container has read the inputStream to create the parameters Map, of course there is nothing left to read. And in the Tomcat implementation, if you read the inputStream first, then the getParameter* family of methods has nothing left to work on, since, as you correctly note, it doesn't use getInputStream or getReader but accesses internally its optimized reader.
So your only solution in your ServletRequestWrapper is to override getInputStream, getReader, AND the getParameter* family on which Struts relies to read the parameters. Maybe you can have a look at org.apache.catalina.util.RequestUtil to not duplicate the POST body parsing part.
What you have to do in your filter is read the post content in its entirety then when you go to pass the request on to the chain; back the input stream with your own. For example you read the post to file on disk, then when you call:
chain.doFilter(new ServletRequest() {}, response);
You can delegate most methods invocations of your class to the original request, but when it comes time to opening the input stream you need to read from your file on disk.
You need to make sure you don't leak resources as this will be invoked quite frequently and can hurt if done incorrectly.
The in the question linked filter example looks good and ought to work. Maybe you're defining it in the web.xml after the Struts dispatcher filter. It would then indeed be too late to parse and log the request body and still make it available for Struts. You need to declare this filter before the Struts dispatcher filter. The filter ordering matters, they are invoked in the order as they're definied in web.xml.