How should I serve ZIPped webpages?

How should I serve ZIPped webpages? - java

Background:
Our software generates reports for customers in the usual suspect formats (HTML, PDF, etc.) and each report can contain charts and other graphics unique to that report. For PDFs everthing is held in one place - the PDF file itself. HTML is trickier as the report is basically the sum of more than 1 file. The files are available via HTTP through Tomcat.
Problem:
I really want to have a tidy environment and wrap the HTML reports into a single file. There's MTHML, Data URIs, several formats to consider. This excellent question posits that, given the lack of cross-broser support for these formats, ZIP is a neat solution. This is attractive to me as I can also offer the zip for download as a "HTML report you can email" option. (In the past users have complained about losing the graphics about when they set about emailling HTML reports)
The solution seems simple. A request comes in, I locate the appropriate zip, unpack it somewhere on the webserver, point the request at the new HTML file, and after a day or so tidy everything up again.
But something doesn't quite seem right about that. I've kind of got a gut feeling that it's not a good solution, that there's something intrisically wrong with it, or that maybe a better way exists that I can't see at the moment.
Can anyone suggest whether this is good or bad, and offer an alternative solution?
Edit for more background information!
The reports need to persist on the server. Our customers are users at sites, and the visibility of a single report could be as wide as everyone at the site. The creation process involves the user selecting the criteria for the report, and submitting it for creation to the server. Data is extracted from the database and a document built. A placeholder record goes into the database, and the documents themselves get stored on the fileserver somewhere. It's the 'documents on the fileserver' part that I'd like to be tidier - zipping also means less disk space used!. Once a report is created, it is available to anyone who can see it.

I would have thought the plan would be that the zip file ends up on the client rather than staying on the server.
Without knowing about your architecture, I would guess at an approach like this:
User requests report
Server displays report as HTML
User perhaps tweaks some parameters, repeats request
Server displays report as HTML (repeat until user is happy)
On each of the HTML reports, there's a "download as zip" link
User clicks on link
Server regenerates report, stores it in a zip file and serves it to the user
User saves zip file somewhere, emails it around etc - server isn't involved at all
This relies on being able to rerun the report to generate the zip file, of course. You could generate a zip file each time you generate some HTML, but that's wasteful if you don't need to do it, and requires clean-up etc.
Perhaps I've misunderstood you though... if this doesn't sound appropriate, could you update your question?
EDIT: Okay, having seen the update to your question, I'd be tempted to store the files for each report in a separate directory (e.g. using a GUID as the directory name). Many file systems support compression at the file system level, so "premature zipping" probably wouldn't save much disk space, and would make extracting individual files harder. Then if the user requests a zip, you just need to build the zip file at that point, probably just in memory, before serving it.

Once a report is created, it is
available to anyone who can see it.
that is quite telling - it means that the reports are sharable, and you also would like to "cache" reports so that it doesnt have to be regenerated.
one way to do this would be to work out a way to hash the parameters together, in such a way that different parameter combinations (that result in different a report) hash to different values. then, you can use those hash as a key into a large cache of reports stored in disk in zip (may be the name of the file is the hash?)
that way, every time someone requests a report, you hash the parameters, and check if that report was already generated, and serve that up, either as a zip download, or, you can unzip it, and serve up the html as per normal. If the report doesnt exist, generate it, and zip it, make sure to be able to identify it later on as being produced by these parameters (i.e., record the hash).
one thing to be careful of is that file system writes tends to be non-atomic, so if you are not careful, you will regenerate the report twice, which sucks, but luckily in your case, not too harmful. to avoid, you can use a single thread to do it (slower), or implement some kind of lock.

You dont need to physically create zip files on a file system. Theres nothing wrong with creating the zips in memory, stream it to the browser and let GC take care of releasing the memory taken by the temporary zip. This of course introduces problems as it could be potentially ineffecient to continnally recreate the zip each time a request is made. However judge these things according to your needs and so on.

Related

Make a downloadable file available in an API that returns JSON

I am developing an API in Java. It is basically a java servlet that returns content in json (application/json). Using a Tomcat server. One of the field in the response is supposed to be a link to a downloadable .txt file.
I wonder what is the best way to deliver this file:
Generating this file on every request seems to me killer, even having some cron to clean directories with files
Any way to give a temporary link only while that request for a period without saving to the file system?
Thank you.

If you say writing to the file system would kill your application, then I deduce from that that your IO performance is too weak for that, right? I mean, if you even would not have the storage capacity for that, then your infrastructure is not suitable for your application at all. I can only see four other ways for solving that problem (but maybe there are more, my list is not exclusive):
Store the text file in a database. The database should also store timeout information. Good if there are more than 1 application servers and there is a load balancer in front of them (but all application servers share the same database).
Store the text file in RAM, maybe using a cache library which does the cleanup tasks automatically for you - but be aware that a cache library will usually not guarantee a minimum storage time for each file.
Do not store the text file at all, but create it just when it is requested (no idea if that is possible in your application).
Do not provide a link to the text file, but directly include its content in the json answer (of course it would then be escaped as a JSon String), which means your server can directly forget about it when the answer has been sent, but the client _must_ download it without checking if it needs the file or not.

Prevent access to files generated by a Java application

I'm working on an application that generates a series of files based on user input that would be later uploaded to a cloud service.
It is important that these files remain intact (prevent modification or deletion) before they are uploaded.
What would be the best way to accomplish this? I could work with Windows only solutions. I know that a sufficiently motivated advanced user would be able to do it anyways but I'm looking for solutions to obscure or just make it harder than 'Select all > Delete'.
All these generated files will be within the same folder.
Thanks in advance!
Edit: I previously stated that I didn't mind the files could be read but that would enable copying and storing them somewhere else and that is undesirable too.

Creating a file for download (client side vs. backend)

I need to allow for csv-file downloads on my page and I was going to try ngCsv (from Angular) but for browser support this seems fairly limited. I've seen quite a few examples of this being done with vanilla Javascript. And after a discussion with a colleague of "backend vs. frontend" I'm feeling more and more unsure of what to do.
Are there any true optimization/efficiency reasons why I should avoid doing this on the client side (assuming the files are no more than 100MB each download)?

Are there any true optimization/efficiency reasons why I should avoid
doing this on the client side (assuming the files are no more than
100MB each download)?
If the data on the .csv would be the same for each user, and only updated every now and then, I would suggest you have your server create / update a static .csv. It wouldn't be resource-intensive, and you wouldn't have to worry about browser compatibility / user resources.
If, however, the data you need to create a .csv for is different on a per-user basis, then you should consider creating the file client-side. If you can help it, you don't want your server having to dynamically generate 100MB .csv files each time a user clicks the link.
You could write a script that only generates the .csv client-side if the browser is not mobile and there is web-worker support. If either of those conditions are not met, you could fall back to having your server do it.
Ultimately, your answer is going to really depend on the requirements / context of this project. Try to cache the results where possible, and use common sense. Good luck :)

Web application that deals with files -upload download

I have an existing swing desktop application that I wish to convert to a web application. The first thing that is stopping from doing so is that the desktop application deals with writing and reading from PDF files. Also the user fills up the PDF forms which needs to be read by the application.
Now a typical use case in the desktop application is like, the user logs in opens a PDF form and fills it up. The swing application manages where the file is stored so it goes to the file and reads the form, extracts the data and stores the data in the db. The user might not fill up the form all in one go. He might save it come back to it later and continue.
All of this needs to be done by the web app now. My problem is I don't want the user to download and upload the form multiple times to the server. That would eat the bandwidth and also asking the use to save the file locally and upload it back once he completes filling the form doesn't appeal to me since the desktop application nicely used to manage the location of these files as well.
Would I need to implement something like a dropbox kind of thing? A small deamon running continuously to check what file has been updated and upload it to the server? That would be difficult since at the server I wouldn't know if the file was latest or not. Is there anything like this that someone might have done before?

I have anther suggestion: why don't you show the user a form with the same fields and transfer them to the PDF after the user submits. This way the Pdf does not leave the server and you transmit just the minimal amount of data.

Switching to a web-version of the application may force you to re-think some of the way you are doing things. Certainly browsers are intentionally limited in their access to the local file system which introduces a major hurdle for your current mode of operation.
Even if you could display the PDF within a browser, detect the completion of edits and send this back to the server from within browser code (which is probably possible), you'll be subject to different browsers doing different (strange) things with whatever pdf plugin is installed.
As Vitaliy mentioned already, switching being able to populate a (web) form in the browser means that whole download upload problem goes away. But then you have to take what the user has done in a web page and pump that into a PDF somehow. If you don't HAVE to start with a PDF, but could collect the data and produce a PDF at the end then you might have more options. For example you could use iText to create a PDF directly if you don't have too many styles of document to work with. You could use something like Docmosis which you can give templates to and get it to populate and render PDFs later. With the Docmosis option you can also ask Docmosis for the list of fields in the template so could build a web form based on the selected template, let the user fill it in, then push that data to Docmosis to produce the file.
Hopefully there's some options there that are useful to you.

Adobe documents how to do this here. Never underestimate the power of design-by-google. I know this can be made to work because I've used PDF forms on line.

I worked on a similar issue a few years ago, although I wasn't dealing with signed forms. The signature definitely makes it a little more difficult. However, I was able to use iText to create a PDF form with fields, and only submit the field data back to the server. Offhand, I unfortunately do not remember exactly what/how we did it, but can confirm it is doable (with limitations/caveats). Ex: User had to have PDF reader plugin installed & User was forced to d/l the pdf every time.
Basically what I did was use iText to create an FDF from a PDF (with form existing form fields). The submit button in the FDF actually submits the form data to a URL of your choosing (not unlike an HTML form). Once you have that data, I believe I merged the form fields (from the FDF) with the PDF on the server side using iText.
Once you use the server to maintain all the form data, the synchronization/locking process you use to ensure that a single user is updating the latest and greatest form data is up to you.
Your comment under jowierun indicates that you want to deal with word/excel/etc docs as well, so I am not entirely sure I am understanding your needs. Your initial post discussed the needs to fill out PDF forms locally, but afterwards it sounds like you are looking for a file-sharing system instead.
Can you please clarify exactly what you are trying to accomplish?

Java - How to find that the user has changed the configuration file?

I am developing a Java Desktop Application. This app needs a configuration to be started. For this, I want to supply a defaultConfig.properties or defaultConfig.xml file with the application so that If user doesn't select any configuration, then the application will start with the help of defaultConfig file.
But I am afraid of my application crash if the user accidentally edit the defaultConfig file. So Is there any mechanism through which I can check before the start of the application that whether the config file has changed or not.
How other applications (out in the market) deal with this type of situation in which their application depends on a configuration file?
If the user edited the config file accidentally or intentionally, then the application won't run in future unless he re-installs the application.

I agree with David in that using a MD5 hash is a good and simple way to accomplish what you want.
Basically you would use the MD5 hashing code provided by the JDK (or somewhere else) to generate a hash-code based on the default data in Config.xml, and save that hash-code to a file (or hardcode it into the function that does the checking). Then each time your application starts load the hash-code that you saved to the file, and then load the Config.xml file and again generate a hash-code from it, compare the saved hash-code to the one generated from the loaded config file, if they are the same then the data has not changed, if they are different, then the data has been modified.
However as others are suggesting if the file should not be editable by the user then you should consider storing the configuration in a manner that the user can not easily edit. The easiest thing I can think of would be to wrap the Output Stream that you are using to write the Config.xml file in a GZIP Output Stream. Not only will this make it difficult for the user to edit the configuration file, but it will also cause the Config.xml file to take up less space.

I am not at all sure that this is a good approach but if you want to go ahead with this you can compute a hash of the configuration file (say md5) and recompute and compare every time the app starts.
Come to think of it, if the user is forbidden to edit a file why expose it? Stick it in a jar file for example, far away from the user's eyes.

If the default configuration is not supposed to be edited, perhaps you don't really want to store it in a file in the first place? Could you not store the default values of the configuration in the code directly?

Remove write permissions for the file. This way the user gets a warning before trying to change the file.
Add a hash or checksum and verify this before loading file
For added security, you can replace the simple hash with a cryptographic signature.

From I have found online so far there seems to be different approaches code wise. none appear to be a 100 hundred percent fix, ex:
The DirectoryWatcher implements
AbstractResourceWatcher to monitor a
specified directory.
Code found here twit88.com develop-a-java-file-watcher
one problem encountered was If I copy
a large file from a remote network
source to the local directory being
monitored, that file will still show
up in the directory listing, but
before the network copy has completed.
If I try to do almost anything non
trivial to the file at that moment
like move it to another directory or
open it for writing, an exception will
be thrown because really the file is
not yet completely there and the OS
still has a write lock on it.
found on the same site, further below.
How the program works It accepts a ResourceListener class, which is FileListener. If a change is detected in the program a onAdd, onChange, or onDelete event will be thrown and passing the file to.
will keep searching for more solutions.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.