How to read Big files (~300 MB) from Google Cloud Storage?

How to read Big files (~300 MB) from Google Cloud Storage? - java

I can easily upload/write or Read contents of files (~80KB) from Google Cloud Storage
Now, I have to perform a bigger task while serving big files (~200MB-300MB) :
1) Need to read the contents of uploaded file into chunks (~10 KB).
<--Want to modify chunked data programmatically-->
2) Repeat the "1" steps until stream read the whole content of the file (from starting to end sequentially).
I tried this program but in response i only some amount of data. How to perform mentioned task?

You should not use the file API (which is deprecated - see comment at the top of the page you mentioned). Instead use the GCS client (mentioned in the deprecation notice). The GCS client allows you to read continuously and you can serialize the state of the GcsInputChannel between requests until read was completed (if read is longer than request timeout). You should also consider using the mapreduce library and use the GoogleCloudStorageLineInput for reading the file and writing the modified one in your mapper (probably map-only in your case).

Related

Reading file on remote server via Java

I am developing a Java application through which I need to read the log files present on a server and perform operations depending on the content of the logs.
Files range from 3GB up to 9GB.
Here on stack I have already read the discussion about reading large files with java, I am attaching the link:
Java reading large file discussion
In the discussion, the files are read locally,
in my case i have to retrieve and read the file on the server, is there an efficient way to achieve this?
I would like to avoid having to download files given their size.
I had thought about using URL Reader to retrieve the files, but I have doubts about the speed of execution.
The files I need to recover are under the path C:\production\LOG\file.log
Do you have any suggestions or advice?

How to determine if a file is complete on Azure File Storage in Java?

For our project we are using Azure File Storage, in which large files (at most 500 MB) can be uploaded and must be processed by Java microservices (based on Spring Boot), by using Azure SDK for Java, that periodically polls the directory to see if new files have been uploaded.
Is it possible, in some ways, to determine when the uploaded file is completely uploaded, without the obvious solutions like monitoring the size?

Unfortunately it is not directly possible to monitor when a file upload has been completed (including monitoring the size). This is because the file upload happens in two stages:
First, an empty file of certain size is created. This maps to Create File REST API operation.
Next, content is written to that file. This maps to Put Range REST API operation. This is where the actual data is written to the file.
Assuming data is written to the file in sequential order (i.e. from byte 0 to file size), one possibility would be to keep on checking last "n" number of bytes of the file and see if all of them are non-zero bytes. That would indicate some data has been written at the end of the file. Again, this is not a fool-proof solution as there may be a case where last "n" bytes are genuinely zero.

Java heap size error in mirth

I am using mirth connect 3.0.3 and i am having a .xml file which is almost 85mb size and contains some device information. i need to read this .xml file and insert that data to the database(sql server).
the problem i am facing is when i try to read the data it is showing java heap size error:
i increased server memory to 1024mb and client memory to 1024mb.
but it is showing the same error. if i increase the memory to more, i am not able to start mirth connect.
any suggestion is appreciated.
Thanks.

Is the XML file comprised of multiple separate sections/pieces of data that would make sense to split up into multiple channel messages? If so, consider using a Batch Adapter. The XML data type has options to split based on element/tag name, the node depth/level, or an XPath query. All of those options currently still require the message to read into memory in its entirety, but it will still be more memory-efficient than reading the entire XML document in as a single message.
You can also use a JavaScript batch script, in which case you're given a Java BufferedReader, and can use the script to read through the file and return a message at a time. In this case, you will not have to read the entire file into memory.
Are there large blobs of data in the message that don't need to be manipulated in a transformer? Like, embedded images, etc? If so, consider using an Attachment Handler. That way you can extract that data and store it once, rather than having it copied and stored multiple times throughout the message lifecycle (for Raw / Transformed / Encoded / etc.).

Append operation on google cloud storage file

I need to append the string content to google cloud storage file.
Actually I'm doing string operation that contains a file size of more than 30 MB.
I'm running my project in google app engine.
When I try to hold the entire content in stringbuilder its throwing java heap space error in google app engine.
Is there any way to append the content in google cloud storage file instead of increasing the java heap space?

Streaming data out in Java is usually accomplished through a Writer pattern. As you build your data, you "write" it out on a stream. Keeping the entire thing in memory leads to the problems you've experienced.
I'm not sure which library you're using to access Google Cloud Storage, but most of them provide a way to write objects a few bytes at a time.
App Engine's legacy Google Cloud Storage API provides an AppEngineFile object with a write() method that you can repeatedly invoke as you create your object. The new Google Cloud Storage Client Library provides a GcsOutputChannel that can do the same.
I might have misunderstood your question, though. Are you asking about creating an object in Google Cloud Storage and then appending content to it afterwards? GCS does not allow for appending to files once they've been fully created, with the limited exception of the "compose" functionality (which would work, except that you can only compose an object a certain number of times before you reach a maximum composition level).

Downloading large files with HttpClient

Is it possible to download large files (>=1Gb) from a servlet to an applet using HttpClient? And what servlet-side lib is useful in this case? Is there another way to approach this?

Any server-side lib that allows you access to the raw output stream should be just fine.
Servlets or JAX-RS for example.
Get the output stream, get the input stream of your file, use a nice big buffer (4k maybe) and pump the bytes from input to output.
On the client side, your applet needs access to the file system. I assume you don't want to keep the 1GB in memory. (maybe we want to stream it to the screen, in which case you don't need elevated access).
Avoid client libraries that try to fully materialize the returned content before handing it to.
Example code here:
Streaming large files in a java servlet

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to read Big files (~300 MB) from Google Cloud Storage? - java

Related

Reading file on remote server via Java

How to determine if a file is complete on Azure File Storage in Java?

Java heap size error in mirth

Append operation on google cloud storage file

Downloading large files with HttpClient

Categories

Resources