I am developing a Java application through which I need to read the log files present on a server and perform operations depending on the content of the logs.
Files range from 3GB up to 9GB.
Here on stack I have already read the discussion about reading large files with java, I am attaching the link:
Java reading large file discussion
In the discussion, the files are read locally,
in my case i have to retrieve and read the file on the server, is there an efficient way to achieve this?
I would like to avoid having to download files given their size.
I had thought about using URL Reader to retrieve the files, but I have doubts about the speed of execution.
The files I need to recover are under the path C:\production\LOG\file.log
Do you have any suggestions or advice?
Related
I dont know how to do this, or whether is possible or wise, so any form of answer that points me to a library, example or reasoning will be helpful.
I need to upload and process some Java XML files (actually, XSLT files - XML Excel files).
I dont want to store the file on the server and then invoke processing on it. Instead, I want to stream the file in, and process it as a stream.
I also want to be able to process multipart file uploads, but still process that as an input stream.
I am expressly trying to avoid creating a file on disk for this.
I have a file which I need to upload to a service, and parse into relevant data. The parser and the uploader both require an InputStream. Ought I to open the file twice? I could save the file to a String, but having many of these files in memory is concerning.
EDIT: Thought I should make it clear that the parsing and uploading are entirely separate processes.
Since you are parsing it already it would be most efficient to load the file into a string. Parse it into indexes to the string, you will save memory and can just upload the string whenever you want to. This would be the most effective way, with memory but maybe not processing time.
A reply to one of the comments above.
Separate processes does not mean different threads or processes, just they do not need each other to operate.
According to current requirement,user will upload files with large size,which he may like to download later. I cannot store the uploaded files in DB because the size of files is large and performance will be impacted if I store uploaded files in DB.
Any one knows any java plugin which provide efficient file management on webserver and maintains the link to file so that the file can be downloaded when the link is requested. Also the code will make sure that user will be able to download only those files which is uploaded by them,they cannot download any file just by modifying the download link etc. I am using spring3 as the framework.
Please suggest how to solve this problem?
if you have write access to the file system why not just save them there ?
you then generate an unique ID and save the hash/file relation in db, you then need to supply the ID to get the file feed from a servlet
Store the file content on a part of filesystem out of web application so you cannot reach it changing the link.
Then you can store on db the path for that file, and return them only if the user has the permissions to read it.
Pay attention, do not store all the file on the same folder, or the number of files could grow too much. So find a way to store them with more folder levels.
Is it possible to download large files (>=1Gb) from a servlet to an applet using HttpClient? And what servlet-side lib is useful in this case? Is there another way to approach this?
Any server-side lib that allows you access to the raw output stream should be just fine.
Servlets or JAX-RS for example.
Get the output stream, get the input stream of your file, use a nice big buffer (4k maybe) and pump the bytes from input to output.
On the client side, your applet needs access to the file system. I assume you don't want to keep the 1GB in memory. (maybe we want to stream it to the screen, in which case you don't need elevated access).
Avoid client libraries that try to fully materialize the returned content before handing it to.
Example code here:
Streaming large files in a java servlet
There are 2 servers that are geographically very far from each other.
One server does file processing, then saves the processed file in a directory:
c:\processed\
Files can be 100-1GB in size.
The 2nd server is to download these files.
What techniques can I use to check if the file correctly downloaded?
Is a checksum all I need to do? will it hash according to the contents of the file or just the file attributes? (or what is best practise)
If the file is 1GB, will creating the checksum take a long time?
Checksum is fine to make sure that the downloaded data matches the source data. For a discussion of making it fast, see What is the fastest way to create a checksum for large files in C#.