I am writing a client-server program in JAVA in which I am sending a file from server to client.As the file size may be quite high therefore I decided to divide the file in 5 parts and then send it to the same client in 5 different Threads.
My Algorithm is to use Java Zip API and create a zip file of the file to be sent,then I will divide the Zip file into 5 parts.
The problem is that there is not method in [ZIP API][2] that could divide the file.
This is the tutorial that I am referring for sending files through Thread.
Anyone who can guide me is there anything wrong with my Algorithm Or do I have to do with different strategy?
You should separate the zipping part from the splitting part. If you have to send these to a client, you probably don't want to keep the complete zip file in memory while you wait for the client to request the next chunk... so the simplest approach would be to zip to disk first, and then serve that file in chunks. At that point, it really doesn't matter that it's a zip file at all - and indeed for certain files types (e.g. images, sound, video) you may not want to go via a zip file at all.
I would suggest you tell the client the file name and size, and then let the client request whatever section of the file it wants. It can then decide what chunk size to use: you just need to seek to the right bit of the file and serve as much data as the client has requested.
Breaking up the file isn't a ZIP function. You could create multiple byte arrays from the resulting zip file (by segmenting the array) and sending each segment in a different thread. This would be similar to what download managers of yesteryear would do.
The client would then have code to re-assemble the byte array in the correct order. You'd probably need to add some additional information to each segment like the correct sequence, the filename to be restored, and the number of segments expected.
Related
For our project we are using Azure File Storage, in which large files (at most 500 MB) can be uploaded and must be processed by Java microservices (based on Spring Boot), by using Azure SDK for Java, that periodically polls the directory to see if new files have been uploaded.
Is it possible, in some ways, to determine when the uploaded file is completely uploaded, without the obvious solutions like monitoring the size?
Unfortunately it is not directly possible to monitor when a file upload has been completed (including monitoring the size). This is because the file upload happens in two stages:
First, an empty file of certain size is created. This maps to Create File REST API operation.
Next, content is written to that file. This maps to Put Range REST API operation. This is where the actual data is written to the file.
Assuming data is written to the file in sequential order (i.e. from byte 0 to file size), one possibility would be to keep on checking last "n" number of bytes of the file and see if all of them are non-zero bytes. That would indicate some data has been written at the end of the file. Again, this is not a fool-proof solution as there may be a case where last "n" bytes are genuinely zero.
I have a file which I need to upload to a service, and parse into relevant data. The parser and the uploader both require an InputStream. Ought I to open the file twice? I could save the file to a String, but having many of these files in memory is concerning.
EDIT: Thought I should make it clear that the parsing and uploading are entirely separate processes.
Since you are parsing it already it would be most efficient to load the file into a string. Parse it into indexes to the string, you will save memory and can just upload the string whenever you want to. This would be the most effective way, with memory but maybe not processing time.
A reply to one of the comments above.
Separate processes does not mean different threads or processes, just they do not need each other to operate.
I developing an application like Internet Download Manager for Android.
I want to know how to download different parts of file in Android as it is done in IDM.
How can I get the metadata of file before download and how to download files in parts?
There is no username-password or any restrictions in downloading... just simple download by url.
Assuming you're using HTTP for the download, you'll want to use the HEAD http verb and RANGE http header.
HEAD will give you the filesize (if available), and then RANGE lets you download a byte range.
Once you have the filesize, divide it into roughly equal sized chunks and spawn download thread for each chunk. Once all are done, write the file chunks in the correct order.
EDIT:
If you don't know how to use the RANGE header, here's another SO answer that explains how: https://stackoverflow.com/a/6323043/1355166
Is it possible to download large files (>=1Gb) from a servlet to an applet using HttpClient? And what servlet-side lib is useful in this case? Is there another way to approach this?
Any server-side lib that allows you access to the raw output stream should be just fine.
Servlets or JAX-RS for example.
Get the output stream, get the input stream of your file, use a nice big buffer (4k maybe) and pump the bytes from input to output.
On the client side, your applet needs access to the file system. I assume you don't want to keep the 1GB in memory. (maybe we want to stream it to the screen, in which case you don't need elevated access).
Avoid client libraries that try to fully materialize the returned content before handing it to.
Example code here:
Streaming large files in a java servlet
There are 2 servers that are geographically very far from each other.
One server does file processing, then saves the processed file in a directory:
c:\processed\
Files can be 100-1GB in size.
The 2nd server is to download these files.
What techniques can I use to check if the file correctly downloaded?
Is a checksum all I need to do? will it hash according to the contents of the file or just the file attributes? (or what is best practise)
If the file is 1GB, will creating the checksum take a long time?
Checksum is fine to make sure that the downloaded data matches the source data. For a discussion of making it fast, see What is the fastest way to create a checksum for large files in C#.